rdblue commented on issue #4621: URL: https://github.com/apache/iceberg/issues/4621#issuecomment-1110336966
You should be able to reconstruct an order, but I'm not sure whether you'd consider it the _same_ order or even if there is an _original_ record/row order. Most systems that work with Iceberg don't have an order because they process data in parallel tasks. Each task has a record order, but there's no order between tasks. For example, when Flink processes data from a Kafka topic, there is order within each Kafka partition, but no order across partitions or really across Flink tasks. Does Pulsar have a concept of total order over rows? I said above that you can reconstruct an order. That's because Iceberg keeps writes in order. For a given append operation, Iceberg writes the data file metadata into a manifest file in the original order. So all you need to do is read snapshots sequentially, order data files sequentially, and then read records from data files sequentially. We could formalize that a bit so that we can keep track of file order within a commit, but I'm skeptical that it is valuable given that most systems rely on partial ordering and not total ordering. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
