rdblue commented on a change in pull request #1222:
URL: https://github.com/apache/iceberg/pull/1222#discussion_r459712376



##########
File path: core/src/main/java/org/apache/iceberg/avro/AvroIO.java
##########
@@ -131,4 +144,55 @@ public boolean markSupported() {
       return stream.markSupported();
     }
   }
+
+  static long findStartingRowPos(Supplier<SeekableInputStream> open, long 
start) {
+    try (SeekableInputStream in = open.get()) {
+      // use a direct decoder that will not buffer so the position of the 
input stream is accurate
+      BinaryDecoder decoder = DecoderFactory.get().directBinaryDecoder(in, 
null);
+
+      // an Avro file's layout looks like this:
+      //   header|block|block|...
+      // the header contains:
+      //   magic|string-map|sync
+      // each block consists of:
+      //   row-count|compressed-size-in-bytes|block-bytes|sync
+
+      // it is necessary to read the header here because this is the only way 
to get the expected file sync bytes
+      byte[] magic = MAGIC_READER.read(decoder, null);
+      if (!Arrays.equals(AVRO_MAGIC, magic)) {
+        throw new InvalidAvroMagicException("Not an Avro file");
+      }
+
+      META_READER.read(decoder, null); // ignore the file metadata, it isn't 
needed
+      byte[] fileSync = SYNC_READER.read(decoder, null);
+
+      // the while loop reads row counts and seeks past the block bytes until 
the next sync pos is >= start, which
+      // indicates that the next sync is the start of the split.
+      byte[] blockSync = new byte[16];
+      long totalRows = 0;
+      long nextSyncPos = in.getPos();
+
+      while (nextSyncPos < start) {

Review comment:
       The fix is to catch `EOFException` and return the number of rows.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to