hi Vaishal, You can certainly use NumPy arrays to create Parquet files, but you will have to do a bit of work to adapt the NumPy arrays to Parquet's (and Arrow's) columnar data model. pandas DataFrame contains NumPy arrays internally.
import pyarrow as pa import pyarrow.parquet as pq import numpy as np arr = np.random.randn(100) a0 = pa.Array.from_pandas(arr) t = pa.Table.from_arrays([a0], ['col0']) pq.write_table(t, 'test.parquet') returned_t = pq.read_table('test.parquet') The function Array.from_pandas may be misleading -- it accepts any Series or 1-dimensional ndarray according to pandas's NumPy-based memory model (e.g. it will convert arrays of Python objects to various Arrow types). We intend to make the pyarrow.array function a better entry point for vanilla NumPy data that did not originate in pandas. Patches to improve the API / user experience for standalone NumPy users would be a great way to contribute to the project(s). See ARROW-564, ARROW-838, ARROW-488. It would be very useful to be able to construct Arrow arrays from a NumPy array plus a boolean mask for nulls, for example. Thanks Wes On Thu, Jun 1, 2017 at 4:34 AM, Shah, Vaishal <vaishal.s...@deshaw.com> wrote: > This is Vaishal from D. E. Shaw and Co. > > We are interested to use py-arrow/parquet for one of our projects, that deals > with numpy arrays. > Parquet provides API to store pandas dataframes on disk, but I could not find > any support for storing numpy arrays. > Since numpy is a trivial form to store data, I was surprised to find no > function to store them in parquet format. Is there any way to store numpy > array in parquet format, that I probably missed? > Or can we expect this support in newer version of parquet? > > Thanks, > Vaishal >