That's the best option right now, but it does cause a copy of the data
in Arrow format to be created. It would be nice to have a more
efficient function (in performance and efficiency) for this:

https://issues.apache.org/jira/browse/ARROW-1993

- Wes

On Fri, Jan 12, 2018 at 4:44 PM, Li Jin <ice.xell...@gmail.com> wrote:
> Hi all,
>
> I am wondering what's the best way to get pyarrow schema from a pandas
> DataFrame?
>
> So far I have:
>
> "
> pdf = pd.DataFrame({'a': [1, 2, 3], 'b': [4.0, 5.0, 6.0], 'c': ["hello",
> "world", "arrow"], 'd': [[1.0, 2.0], [3.0], [4.0, 5.0]]})
>
> arrow_schema = pa.Table.from_pandas(pdf, preserve_index=False).schema
> "
>
> Type inference works pretty well here. I am wondering if this is the
> correct way of doing this and if this approach works well with large
> DataFrames?

Reply via email to