hangc0276 commented on issue #4621:
URL: https://github.com/apache/iceberg/issues/4621#issuecomment-1111133124

   > You should be able to reconstruct an order, but I'm not sure whether you'd 
consider it the _same_ order or even if there is an _original_ record/row order.
   > 
   > Most systems that work with Iceberg don't have an order because they 
process data in parallel tasks. Each task has a record order, but there's no 
order between tasks. For example, when Flink processes data from a Kafka topic, 
there is order within each Kafka partition, but no order across partitions or 
really across Flink tasks. Does Pulsar have a concept of total order over rows?
   > 
   > I said above that you can reconstruct an order. That's because Iceberg 
keeps writes in order. For a given append operation, Iceberg writes the data 
file metadata into a manifest file in the original order. So all you need to do 
is read snapshots sequentially, order data files sequentially, and then read 
records from data files sequentially. We could formalize that a bit so that we 
can keep track of file order within a commit, but I'm skeptical that it is 
valuable given that most systems rely on partial ordering and not total 
ordering.
   
   @rdblue @RussellSpitzer @flyrain Thank you for your patient reply.
   
   For Pulsar, we only need partial ordering instead of total ordering. Let me 
explain how we integrate Pulsar topic with Iceberg.
   
   We will write one topic's messages into one iceberg table. For a Pulsar 
topic, it has many partitions, and for each partition, we will create an 
iceberg writer to deal with message writing. For one message, it will be 
fetched from one topic partition and then written into iceberg table with 
additional metadata fields, such as <partitionId, ledgerId, entryId> 
(<ledgerId, entryId> used to specify one message named `MessageId`). For 
messages from different topic partition, the additional metadata fields 
`partitionId` will be different, and we won't care about the message order 
between different topic partitions. 
   
   For messages from the same topic partition, they will be written into 
iceberg table as the same order they stored in topic partition. The metadata 
fields <ledgerId, entryId> for messages will be strictly increasing. Such as 
[<1, 0>, <1, 1>, <1, 2>, <2, 0>, <2, 1>, <3, 0>, <3, 1>]. 
   
   We will read records from iceberg table by specify `partitionId` and 
`MessageId` range. For example, we specify partitionId 0 and messageId range 
[<1, 0>, <10, 20>].  The iceberg reader need to return the records with 
MessageId strictly increasing order.
   
   The reader need to support partial key pair order for the returned records, 
otherwise, it's hard for Pulsar to reorder the records returned from Iceberg 
reader. Pulsar can keep writes order in partition.
   
   For Iceberg writer, it will write records sequentially into multiple parquet 
files. For Iceberg reader, we will specify partitionId and MessageId range to 
read, it can keep the record order in one parquet file, but it's hard to keep 
order between multiple parquet files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to