[ 
https://issues.apache.org/jira/browse/AVRO-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16427199#comment-16427199
 ] 

mukesh katariya commented on AVRO-2168:
---------------------------------------

Its strange behaviour #

DataFileStream.hasNextBlock API 
Reading from VIN#readLong blockRemaining = 59
blockSize VIN#readLong= 151
blockCount = 59

There are only 120 bytes available to read, it
{code:java}
DataBlock nextRawBlock(DataBlock reuse) throws IOException {
  if (!hasNextBlock()) {
    throw new NoSuchElementException();
  }

  
  if(reuse!=null)
     System.out.println("reuse.data.length = " + reuse.data.length);
  else
    System.out.println("reuse is null");

  if (reuse == null || reuse.data.length < (int) blockSize) {
    reuse = new DataBlock(blockRemaining, (int) blockSize);
  } else {
    reuse.numEntries = blockRemaining;
    reuse.blockSize = (int)blockSize;
  }
  // throws if it can't read the size requested
  vin.readFixed(reuse.data, 0, reuse.blockSize);
  vin.readFixed(syncBuffer);
  availableBlock = false;
  if (!Arrays.equals(syncBuffer, header.sync))
    throw new IOException("Invalid sync!");
  return reuse;
}
{code}
The line below finds a EOF , when it tries to read 151 bytes. the exception is 
thrown and the calling API just returns false.

 
{code:java}
  vin.readFixed(reuse.data, 0, reuse.blockSize);{code}
The below API gets exception of EOF and returns false, and further the data is 
read correctly. This is an issue. Not sure why there is sync header at the EOF.

Why the data block size says 151 and content is less than that, and still the 
program is able to process the record.

 
{code:java}
@Override
public boolean hasNext() {
 
  try {
 
    if (blockRemaining == 0) {
      // check that the previous block was finished
      if (null != datumIn) {
        boolean atEnd = datumIn.isEnd();
        if (!atEnd) {
          throw new IOException("Block read partially, the data may be 
corrupt");
        }
      }
      if (hasNextBlock()) {
        System.out.println("Process if has Next block = true");
        System.out.println("read next raw block..");
        block = nextRawBlock(block);
        System.out.println("decompress block");
        block.decompressUsing(codec); 
        blockBuffer = block.getAsByteBuffer();
   
        datumIn = DecoderFactory.get().binaryDecoder(
            blockBuffer.array(), blockBuffer.arrayOffset() +
            blockBuffer.position(), blockBuffer.remaining(), datumIn);
      }
    }
 
    return blockRemaining != 0;
  } catch (EOFException e) {                    // at EOF
    return false;
  } catch (IOException e) {
    throw new AvroRuntimeException(e);
  }
}
{code}
 

 

> Sync bytes not appended to end of file.
> ---------------------------------------
>
>                 Key: AVRO-2168
>                 URL: https://issues.apache.org/jira/browse/AVRO-2168
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.8.2
>            Reporter: mukesh katariya
>            Priority: Major
>         Attachments: issue.PNG
>
>
> I have a snappy codec file which does not have 16 bytes appended to the end 
> of file.
> Any reason why this happens ?  as you notice 16 bytes in yellow at end of 
> block, but after that the sync bytes are not found until the end of file.
> !issue.PNG|width=1196,height=164!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to