[ https://issues.apache.org/jira/browse/AVRO-380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Scott Carey updated AVRO-380: ----------------------------- Attachment: AVRO-380.patch Updated patch: * Block size and block record count are longs in both the format and the code. This reader implementation will throw an exception if a block size is larger than Integer.MAX_VALUE. * When a block is read from the underlying stream, the reader checks that the number of bytes read is equal to the block size (that it is not truncated). * When a block is finished (blockCount records read), the reader checks that all bytes have been read. This is done by forcing an EOFException -- which is ugly and I plan to change at a later time as an optimization along with other planned changes. * DeflateCodec now writes and reads RFC-1951 'raw' deflate, in line with the documentation. > Avro Container File format change: add block size to block descriptor > ---------------------------------------------------------------------- > > Key: AVRO-380 > URL: https://issues.apache.org/jira/browse/AVRO-380 > Project: Avro > Issue Type: Improvement > Components: doc, java, spec > Affects Versions: 1.3.0 > Reporter: Scott Carey > Fix For: 1.3.0 > > Attachments: AVRO-380.patch, AVRO-380.patch > > > The new file format in AVRO-160 limits a few use cases that I have found to > be important. > A block currently contains a count of the number of records, the block data, > and a sync marker. > This change would add the block size, in bytes, along side the number of > records. > This allows efficient access to a block's data without the need to decode the > data into individual Datums, which is useful for various use cases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.