[
https://issues.apache.org/jira/browse/ARROW-264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15429008#comment-15429008
]
Wes McKinney commented on ARROW-264:
------------------------------------
Looks like a good start to me. We should add some minor internal details, like
padding all byte buffers to start and end on 8-byte boundaries (according to
the Arrow spec memory will already be aligned and padded, but the serialized
metadata may require padding bytes). This is a similar, but much more general
version of a file layout compared with what we did in Feather (which has a
schema and record batch headers in a single metadata chunk, but only a single
record batch and no dictionaries --
https://github.com/wesm/feather/blob/master/doc/FORMAT.md).
> Create an Arrow File format
> ---------------------------
>
> Key: ARROW-264
> URL: https://issues.apache.org/jira/browse/ARROW-264
> Project: Apache Arrow
> Issue Type: Improvement
> Reporter: Julien Le Dem
> Assignee: Julien Le Dem
>
> File layout:
> (DictionaryBatch, RecordBatch, Schema as defined in Message.fbs)
> {noformat}
> MAGIC: ARROW1
> (
> DictionaryBatch: DictionaryBatch Header (FlatBuffer)
> DictionaryBatch: DictionaryBatch Body (buffers concatenated)
> )*
> (
> RecordBacth: RecordBatch Header (FlatBuffer)
> RecordBacth: RecordBatch Body (buffers concatenated)
> )+
> Footer: Flatbuffer
> Footer length: int (4 bytes unsigned LE)
> MAGIC: ARROW1
> {noformat}
> Footer definition:
> {noformat}
> table Footer {
> schema: org.apache.arrow.flatbuf.Schema;
> dictionaries: [ Block ];
> recordBatches: [ Block ];
> }
> struct Block {
> offset: long;
> metaDataLength: int;
> bodyLength: long;
> }
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)