[
https://issues.apache.org/jira/browse/AVRO-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16427199#comment-16427199
]
mukesh katariya commented on AVRO-2168:
---------------------------------------
Its strange behaviour #
DataFileStream.hasNextBlock API
Reading from VIN#readLong blockRemaining = 59
blockSize VIN#readLong= 151
blockCount = 59
There are only 120 bytes available to read, it
{code:java}
DataBlock nextRawBlock(DataBlock reuse) throws IOException {
if (!hasNextBlock()) {
throw new NoSuchElementException();
}
if(reuse!=null)
System.out.println("reuse.data.length = " + reuse.data.length);
else
System.out.println("reuse is null");
if (reuse == null || reuse.data.length < (int) blockSize) {
reuse = new DataBlock(blockRemaining, (int) blockSize);
} else {
reuse.numEntries = blockRemaining;
reuse.blockSize = (int)blockSize;
}
// throws if it can't read the size requested
vin.readFixed(reuse.data, 0, reuse.blockSize);
vin.readFixed(syncBuffer);
availableBlock = false;
if (!Arrays.equals(syncBuffer, header.sync))
throw new IOException("Invalid sync!");
return reuse;
}
{code}
The line below finds a EOF , when it tries to read 151 bytes. the exception is
thrown and the calling API just returns false.
{code:java}
vin.readFixed(reuse.data, 0, reuse.blockSize);{code}
The below API gets exception of EOF and returns false, and further the data is
read correctly. This is an issue. Not sure why there is sync header at the EOF.
Why the data block size says 151 and content is less than that, and still the
program is able to process the record.
{code:java}
@Override
public boolean hasNext() {
try {
if (blockRemaining == 0) {
// check that the previous block was finished
if (null != datumIn) {
boolean atEnd = datumIn.isEnd();
if (!atEnd) {
throw new IOException("Block read partially, the data may be
corrupt");
}
}
if (hasNextBlock()) {
System.out.println("Process if has Next block = true");
System.out.println("read next raw block..");
block = nextRawBlock(block);
System.out.println("decompress block");
block.decompressUsing(codec);
blockBuffer = block.getAsByteBuffer();
datumIn = DecoderFactory.get().binaryDecoder(
blockBuffer.array(), blockBuffer.arrayOffset() +
blockBuffer.position(), blockBuffer.remaining(), datumIn);
}
}
return blockRemaining != 0;
} catch (EOFException e) { // at EOF
return false;
} catch (IOException e) {
throw new AvroRuntimeException(e);
}
}
{code}
> Sync bytes not appended to end of file.
> ---------------------------------------
>
> Key: AVRO-2168
> URL: https://issues.apache.org/jira/browse/AVRO-2168
> Project: Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.8.2
> Reporter: mukesh katariya
> Priority: Major
> Attachments: issue.PNG
>
>
> I have a snappy codec file which does not have 16 bytes appended to the end
> of file.
> Any reason why this happens ? as you notice 16 bytes in yellow at end of
> block, but after that the sync bytes are not found until the end of file.
> !issue.PNG|width=1196,height=164!
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)