[GitHub] [arrow] liusitan opened a new issue, #13850: Confusion about the Arrow Formats Categorization

GitBox Wed, 10 Aug 2022 17:12:08 -0700


liusitan opened a new issue, #13850:
URL: https://github.com/apache/arrow/issues/13850


   I am looking at the 
   
https://arrow.apache.org/cookbook/py/io.html#memory-mapping-arrow-arrays-from-disk
   Recently I am trying to implement arrow file format for memory_mapped read. 
However, I don't know which one to use.
   Arrow arrays that have been written to disk in the **_Arrow IPC format_** 
can be memory mapped back directly from the disk.
   
   with pa.memory_map('arraydata.arrow', 'r') as source:
       loaded_arrays = pa.ipc.open_file(source).read_all()
   arr = loaded_arrays[0]
   print(f"{arr[0]} .. {arr[-1]}")
   0 .. 99
   
   What does the Arrow IPC format refer to? 
   Is the arrow ipc streaming format or the arrow ipc format, what's the 
difference? 
   <img width="269" alt="image" 
src="https://user-images.githubusercontent.com/20542539/184042849-2a8fc178-e4b4-4c1e-8f36-7dddc61bfddb.png";>
   
   By RecordBatchFileWriter, what kind of arrow format does it write to?
   with pa.OSFile('penguin-dataset.arrow', 'wb') as sink:
       with pa.RecordBatchFileWriter(sink, table.schema) as writer:
           writer.write_table(table)
   
   how about this one? what kind of arrow format does it write to?
   with open(filename, 'wb') as sink:
       with pa.ipc.RecordBatchStreamWriter(sink, pdf.schema) as writer:
           data = writer.write(pdf)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] liusitan opened a new issue, #13850: Confusion about the Arrow Formats Categorization

Reply via email to