[ https://issues.apache.org/jira/browse/AVRO-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16427199#comment-16427199 ]
mukesh katariya commented on AVRO-2168: --------------------------------------- Its strange behaviour # DataFileStream.hasNextBlock API Reading from VIN#readLong blockRemaining = 59 blockSize VIN#readLong= 151 blockCount = 59 There are only 120 bytes available to read, it {code:java} DataBlock nextRawBlock(DataBlock reuse) throws IOException { if (!hasNextBlock()) { throw new NoSuchElementException(); } if(reuse!=null) System.out.println("reuse.data.length = " + reuse.data.length); else System.out.println("reuse is null"); if (reuse == null || reuse.data.length < (int) blockSize) { reuse = new DataBlock(blockRemaining, (int) blockSize); } else { reuse.numEntries = blockRemaining; reuse.blockSize = (int)blockSize; } // throws if it can't read the size requested vin.readFixed(reuse.data, 0, reuse.blockSize); vin.readFixed(syncBuffer); availableBlock = false; if (!Arrays.equals(syncBuffer, header.sync)) throw new IOException("Invalid sync!"); return reuse; } {code} The line below finds a EOF , when it tries to read 151 bytes. the exception is thrown and the calling API just returns false. {code:java} vin.readFixed(reuse.data, 0, reuse.blockSize);{code} The below API gets exception of EOF and returns false, and further the data is read correctly. This is an issue. Not sure why there is sync header at the EOF. Why the data block size says 151 and content is less than that, and still the program is able to process the record. {code:java} @Override public boolean hasNext() { try { if (blockRemaining == 0) { // check that the previous block was finished if (null != datumIn) { boolean atEnd = datumIn.isEnd(); if (!atEnd) { throw new IOException("Block read partially, the data may be corrupt"); } } if (hasNextBlock()) { System.out.println("Process if has Next block = true"); System.out.println("read next raw block.."); block = nextRawBlock(block); System.out.println("decompress block"); block.decompressUsing(codec); blockBuffer = block.getAsByteBuffer(); datumIn = DecoderFactory.get().binaryDecoder( blockBuffer.array(), blockBuffer.arrayOffset() + blockBuffer.position(), blockBuffer.remaining(), datumIn); } } return blockRemaining != 0; } catch (EOFException e) { // at EOF return false; } catch (IOException e) { throw new AvroRuntimeException(e); } } {code} > Sync bytes not appended to end of file. > --------------------------------------- > > Key: AVRO-2168 > URL: https://issues.apache.org/jira/browse/AVRO-2168 > Project: Avro > Issue Type: Bug > Components: java > Affects Versions: 1.8.2 > Reporter: mukesh katariya > Priority: Major > Attachments: issue.PNG > > > I have a snappy codec file which does not have 16 bytes appended to the end > of file. > Any reason why this happens ? as you notice 16 bytes in yellow at end of > block, but after that the sync bytes are not found until the end of file. > !issue.PNG|width=1196,height=164! -- This message was sent by Atlassian JIRA (v7.6.3#76005)