This is Vaishal from D. E. Shaw and Co.
We are interested to use py-arrow/parquet for one of our projects, that deals with numpy arrays. Parquet provides API to store pandas dataframes on disk, but I could not find any support for storing numpy arrays. Since numpy is a trivial form to store data, I was surprised to find no function to store them in parquet format. Is there any way to store numpy array in parquet format, that I probably missed? Or can we expect this support in newer version of parquet? Pyarrow provides one using Tensors(but read_tensor requires file to be opened in writeable mode, so that compels to use mem_mapped files) and in order to read a file, it needs to be in writeable mode, that is kind of a bug! Can you please look into this? -- *Regards* *Vaishal Shah,* *Third year Undergraduate student,* *Department of Computer Science and Engineering,* *IIT Kharagpur*
