This is Vaishal from D. E. Shaw and Co.


We are interested to use py-arrow/parquet for one of our projects, that
deals with numpy arrays.

Parquet provides API to store pandas dataframes on disk, but I could not
find any support for storing numpy arrays.


Since numpy is a trivial form to store data, I was surprised to find no
function to store them in parquet format. Is there any way to store numpy
array in parquet format, that I probably missed?

Or can we expect this support in newer version of parquet?


Pyarrow provides one using Tensors(but read_tensor requires file to be
opened in writeable mode, so that compels to use mem_mapped files) and in
order to read a file, it needs to be in writeable mode, that is kind of a
bug! Can you please look into this?



-- 
*Regards*

*Vaishal Shah,*
*Third year Undergraduate student,*
*Department of Computer Science and Engineering,*
*IIT Kharagpur*

Reply via email to