That's the best option right now, but it does cause a copy of the data in Arrow format to be created. It would be nice to have a more efficient function (in performance and efficiency) for this:
https://issues.apache.org/jira/browse/ARROW-1993 - Wes On Fri, Jan 12, 2018 at 4:44 PM, Li Jin <ice.xell...@gmail.com> wrote: > Hi all, > > I am wondering what's the best way to get pyarrow schema from a pandas > DataFrame? > > So far I have: > > " > pdf = pd.DataFrame({'a': [1, 2, 3], 'b': [4.0, 5.0, 6.0], 'c': ["hello", > "world", "arrow"], 'd': [[1.0, 2.0], [3.0], [4.0, 5.0]]}) > > arrow_schema = pa.Table.from_pandas(pdf, preserve_index=False).schema > " > > Type inference works pretty well here. I am wondering if this is the > correct way of doing this and if this approach works well with large > DataFrames?