[ 
https://issues.apache.org/jira/browse/AVRO-380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-380:
-----------------------------

    Attachment: AVRO-380.patch

Updated patch:

* Block size and block record count are longs in both the format and the code.  
 This reader implementation will throw an exception if a block size is larger 
than Integer.MAX_VALUE.
* When a block is read from the underlying stream, the reader checks that the 
number of bytes read is equal to the block size (that it is not truncated).
* When a block is finished (blockCount records read), the reader checks that 
all bytes have been read.  This is done by forcing an EOFException -- which is 
ugly and I plan to change at a later time as an optimization along with other 
planned changes.
* DeflateCodec now writes and reads RFC-1951 'raw' deflate, in line with the 
documentation.


> Avro Container File format change:  add block size to block descriptor
> ----------------------------------------------------------------------
>
>                 Key: AVRO-380
>                 URL: https://issues.apache.org/jira/browse/AVRO-380
>             Project: Avro
>          Issue Type: Improvement
>          Components: doc, java, spec
>    Affects Versions: 1.3.0
>            Reporter: Scott Carey
>             Fix For: 1.3.0
>
>         Attachments: AVRO-380.patch, AVRO-380.patch
>
>
> The new file format in AVRO-160 limits a few use cases that I have found to 
> be important.
> A block currently contains a count of the number of records, the block data, 
> and a sync marker.  
> This change would add the block size, in bytes, along side the number of 
> records.   
> This allows efficient access to a block's data without the need to decode the 
> data into individual Datums, which is useful for various use cases.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to