rdblue opened a new pull request #1222: URL: https://github.com/apache/iceberg/pull/1222
This adds an Avro `ValueReader` that returns the position of a row within a file. The position reader's initial position is set from a callback that returns the starting row position of the split that is being read. The callback is passed to classes that implement a new interface, `SupportsRowPosition`. This uses a callback so that if there is no position reader, the starting row position does not need to be calculated, which is expensive. Finding the row position at the start of a split requires scanning through an Avro file stream. `AvroIO` now includes a utility method that keeps track of the number of rows in each Avro block and seeks past the block content until the next block is after the given split starting point. This validates Avro sync bytes to ensure the count is accurate. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
