Re: What's the best way to get pyarrow schema from a pandas DataFrame?

Li Jin Fri, 12 Jan 2018 15:07:22 -0800

Thanks Wes!

On Fri, Jan 12, 2018 at 6:04 PM, Wes McKinney <[email protected]> wrote:


> That's the best option right now, but it does cause a copy of the data
> in Arrow format to be created. It would be nice to have a more
> efficient function (in performance and efficiency) for this:
>
> https://issues.apache.org/jira/browse/ARROW-1993
>
> - Wes
>
> On Fri, Jan 12, 2018 at 4:44 PM, Li Jin <[email protected]> wrote:
> > Hi all,
> >
> > I am wondering what's the best way to get pyarrow schema from a pandas
> > DataFrame?
> >
> > So far I have:
> >
> > "
> > pdf = pd.DataFrame({'a': [1, 2, 3], 'b': [4.0, 5.0, 6.0], 'c': ["hello",
> > "world", "arrow"], 'd': [[1.0, 2.0], [3.0], [4.0, 5.0]]})
> >
> > arrow_schema = pa.Table.from_pandas(pdf, preserve_index=False).schema
> > "
> >
> > Type inference works pretty well here. I am wondering if this is the
> > correct way of doing this and if this approach works well with large
> > DataFrames?
>

Re: What's the best way to get pyarrow schema from a pandas DataFrame?

Reply via email to