[ https://issues.apache.org/jira/browse/ARROW-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Lee updated ARROW-4032: ----------------------------- Description: Here's a proposal to create a pyarrow.Table.from_pydict() function. Right now only pyarrow.Table.from_pandas() exist and there are inherit problems using Pandas with NULL support for Int(s) and Boolean(s) [http://pandas.pydata.org/pandas-docs/version/0.23.4/gotchas.html] {{NaN}}, Integer {{NA}} values and {{NA}} type promotions: Sample python code on how this would work. {code:java} import pyarrow as pa from datetime import datetime test_list = [ {"name": "Tom", "age": 10}, {"name": "Mark", "age": 5, "city": "San Francisco"}, {"name": "Pam", "age": 7, "birthday": datetime.now()} ] def from_pydict(pylist, columns): arrow_columns = list() for column in columns: arrow_columns.append(pa.array([v[column] if column in v else None for v in pylist])) arrow_table = pa.Table.from_arrays(arrow_columns, columns) return arrow_table test = from_pydict(test_list, ['name' , 'age', 'city', 'birthday', 'dummy']) {code} Additional work would be needed to pass in a schema object if you want to refine data types further. I think the existing code from from_pandas() to do that would work. was: Here's a proposal to create a pyarrow.Table.from_pydict() function. Right now only pyarrow.Table.from_pandas() exist and there are inherit problems using Pandas with NULL support for Int(s) and Boolean(s) [http://pandas.pydata.org/pandas-docs/version/0.23.4/gotchas.html] {{NaN}}, Integer {{NA}} values and {{NA}} type promotions: Sample python code on how this would work. {code:java} import pyarrow as pa from datetime import datetime test_list = [ {"name": "Tom", "age": 10}, {"name": "Mark", "age": 5, "city": "San Francisco"}, {"name": "Pam", "age": 7, "birthday": datetime.now()} ] def from_pydict(pylist, columns): arrow_columns = list() for column in columns: arrow_columns.append(pa.array([v[column] if column in v else None for v in pylist])) arrow_table = pa.Table.from_arrays(arrow_columns, columns) return arrow_table test = from_pydict(test_list, ['name' , 'age', 'city', 'birthday', 'dummy']) {code} > [Python] New pyarrow.Table.from_pydict() function > ------------------------------------------------- > > Key: ARROW-4032 > URL: https://issues.apache.org/jira/browse/ARROW-4032 > Project: Apache Arrow > Issue Type: Task > Components: Python > Reporter: David Lee > Priority: Minor > > Here's a proposal to create a pyarrow.Table.from_pydict() function. > Right now only pyarrow.Table.from_pandas() exist and there are inherit > problems using Pandas with NULL support for Int(s) and Boolean(s) > [http://pandas.pydata.org/pandas-docs/version/0.23.4/gotchas.html] > {{NaN}}, Integer {{NA}} values and {{NA}} type promotions: > Sample python code on how this would work. > > {code:java} > import pyarrow as pa > from datetime import datetime > test_list = [ > {"name": "Tom", "age": 10}, > {"name": "Mark", "age": 5, "city": "San Francisco"}, > {"name": "Pam", "age": 7, "birthday": datetime.now()} > ] > def from_pydict(pylist, columns): > arrow_columns = list() > for column in columns: > arrow_columns.append(pa.array([v[column] if column in v else None for > v in pylist])) > arrow_table = pa.Table.from_arrays(arrow_columns, columns) > return arrow_table > test = from_pydict(test_list, ['name' , 'age', 'city', 'birthday', 'dummy']) > {code} > Additional work would be needed to pass in a schema object if you want to > refine data types further. I think the existing code from from_pandas() to do > that would work. -- This message was sent by Atlassian JIRA (v7.6.3#76005)