Re: [PR] GH-49946: [Format] Better document equivalence between IPC file and streams [arrow]

via GitHub Mon, 11 May 2026 06:53:39 -0700


paleolimbot commented on code in PR #49947:
URL: https://github.com/apache/arrow/pull/49947#discussion_r3219434036



##########
docs/source/format/Columnar.rst:
##########
@@ -1524,21 +1529,46 @@ Schematically we have: ::
     <empty padding bytes [to 8 byte boundary]>
     <STREAMING FORMAT with EOS>
     <FOOTER>
-    <FOOTER SIZE: int32>
+    <FOOTER SIZE: little-endian int32>
     <magic number "ARROW1">
 
-In the file format, there is no requirement that dictionary keys
-should be defined in a ``DictionaryBatch`` before they are used in a
-``RecordBatch``, as long as the keys are defined somewhere in the
-file. Further more, it is invalid to have more than one **non-delta**
-dictionary batch per dictionary ID (i.e. dictionary replacement is not
-supported). Delta dictionaries are applied in the order they appear in
-the file footer. We recommend the ".arrow" extension for files created with
-this format. Note that files created with this format are sometimes called
-"Feather V2" or with the ".feather" extension, the name and the extension
-derived from "Feather (V1)", which was a proof of concept early in
-the Arrow project for language-agnostic fast data frame storage for
-Python (pandas) and R.
+Equivalence with the IPC Streaming Format
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+While it is theoretically possible for the IPC File footer to list RecordBatch
+messages in a differing order from the embedded IPC Stream's sequential order
+(or even to repeat or omit some of the IPC Stream's RecordBatch messages),
+compliant writers SHOULD arrange the IPC File footer so that an IPC File can be
+read using an IPC Stream reader with equivalent results.

Review Comment:
   That's what I was thinking of (but no need to deal with this now 🙂 )



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-49946: [Format] Better document equivalence between IPC file and streams [arrow]

Reply via email to