Re: [I] [Format] Encapsulated message format metadata_size ambiguity. [arrow]

via GitHub Sun, 19 Oct 2025 23:04:12 -0700


willtemperley commented on issue #47824:
URL: https://github.com/apache/arrow/issues/47824#issuecomment-3420684950


   @lidavidm Thanks for clarifying this. Yes exactly, `metadata_size` in the 
encapsulated message format and `metadata_size` in the `Block` struct in the 
footer refer to _almost_ the same thing, except the one in the footer is the 
`metadata_size` plus the encapsulated message format prefix length.
   
   I think this is definitely confusing!  Reading 
[encapsulated-message-format](https://arrow.apache.org/docs/format/Columnar.html#encapsulated-message-format)
 :
   
   > IPC File Format
   > 
   > We define a “file format” supporting random access that is an extension of 
the stream format. The file starts and ends with a magic string ARROW1 (plus 
padding). What follows in the file is identical to the stream format. At the 
end of the file, we write a footer containing a redundant copy of the schema 
(which is a part of the streaming format) plus memory offsets and sizes for 
each of the data blocks in the file. This enables random access to any record 
batch in the file. See 
[File.fbs](https://github.com/apache/arrow/blob/main/format/File.fbs) for the 
precise details of the file footer.
   
   So looking at File.fbs we have:
   
   ```
   struct Block {
   
     /// Index to the start of the RecordBlock (note this is past the Message 
header)
     offset: long;
   
     /// Length of the metadata
     metaDataLength: int;
   
     /// Length of the data (this is aligned so there can be a gap between this 
and
     /// the metadata).
     bodyLength: long;
   }
   ```
   
   We have `metaDataLength` aka `metadata_size` but no mention of the prefix. 
Perhaps this could be made explicit?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Format] Encapsulated message format metadata_size ambiguity. [arrow]

Reply via email to