[jira] [Commented] (ARROW-264) Create an Arrow File format

Wes McKinney (JIRA) Fri, 19 Aug 2016 16:05:07 -0700

    [ 
https://issues.apache.org/jira/browse/ARROW-264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15429008#comment-15429008
 ]


Wes McKinney commented on ARROW-264:
------------------------------------

Looks like a good start to me. We should add some minor internal details, like 
padding all byte buffers to start and end on 8-byte boundaries (according to 
the Arrow spec memory will already be aligned and padded, but the serialized 
metadata may require padding bytes). This is a similar, but much more general 
version of a file layout compared with what we did in Feather (which has a 
schema and record batch headers in a single metadata chunk, but only a single 
record batch and no dictionaries -- 
https://github.com/wesm/feather/blob/master/doc/FORMAT.md).

> Create an Arrow File format
> ---------------------------
>
>                 Key: ARROW-264
>                 URL: https://issues.apache.org/jira/browse/ARROW-264
>             Project: Apache Arrow
>          Issue Type: Improvement
>            Reporter: Julien Le Dem
>            Assignee: Julien Le Dem
>
> File layout:
> (DictionaryBatch, RecordBatch, Schema as defined in Message.fbs)
> {noformat}
> MAGIC:   ARROW1
> (
> DictionaryBatch:  DictionaryBatch Header (FlatBuffer)
> DictionaryBatch: DictionaryBatch Body (buffers concatenated)
> )*
> (
> RecordBacth: RecordBatch Header (FlatBuffer)
> RecordBacth: RecordBatch Body (buffers concatenated)
> )+
> Footer: Flatbuffer
> Footer length: int (4 bytes unsigned LE)
> MAGIC: ARROW1
> {noformat}
> Footer definition:
> {noformat}
> table Footer {
>   schema: org.apache.arrow.flatbuf.Schema;
>   dictionaries: [ Block ];
>   recordBatches: [ Block ];
> }
> struct Block {
>   offset: long;
>   metaDataLength: int;
>   bodyLength: long;
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ARROW-264) Create an Arrow File format

Reply via email to