[jira] [Updated] (ARROW-4032) [Python] New pyarrow.Table.from_pydict() function

David Lee (JIRA) Fri, 14 Dec 2018 11:38:12 -0800


     [ 
https://issues.apache.org/jira/browse/ARROW-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


David Lee updated ARROW-4032:
-----------------------------
    Description: 
Here's a proposal to create a pyarrow.Table.from_pydict() function.

Right now only pyarrow.Table.from_pandas() exist and there are inherit problems 
using Pandas with NULL support for Int(s) and Boolean(s)

[http://pandas.pydata.org/pandas-docs/version/0.23.4/gotchas.html]

{{NaN}}, Integer {{NA}} values and {{NA}} type promotions:

Sample python code on how this would work.

 
{code:java}
import pyarrow as pa
from datetime import datetime

test_list = [
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5, "city": "San Francisco"},
{"name": "Pam", "age": 7, "birthday": datetime.now()}
]

def from_pydict(pylist, columns):
    arrow_columns = list()
    for column in columns:
        arrow_columns.append(pa.array([v[column] if column in v else None for v 
in pylist]))
    arrow_table = pa.Table.from_arrays(arrow_columns, columns)
    return arrow_table

test = from_pydict(test_list, ['name' , 'age', 'city', 'birthday', 'dummy'])

{code}
Additional work would be needed to pass in a schema object if you want to 
refine data types further. I think the existing code from from_pandas() to do 
that would work.

  was:
Here's a proposal to create a pyarrow.Table.from_pydict() function.

Right now only pyarrow.Table.from_pandas() exist and there are inherit problems 
using Pandas with NULL support for Int(s) and Boolean(s)

[http://pandas.pydata.org/pandas-docs/version/0.23.4/gotchas.html]

{{NaN}}, Integer {{NA}} values and {{NA}} type promotions:

Sample python code on how this would work.

 
{code:java}
import pyarrow as pa
from datetime import datetime

test_list = [
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5, "city": "San Francisco"},
{"name": "Pam", "age": 7, "birthday": datetime.now()}
]

def from_pydict(pylist, columns):
    arrow_columns = list()
    for column in columns:
        arrow_columns.append(pa.array([v[column] if column in v else None for v 
in pylist]))
    arrow_table = pa.Table.from_arrays(arrow_columns, columns)
    return arrow_table

test = from_pydict(test_list, ['name' , 'age', 'city', 'birthday', 'dummy'])

{code}
 


> [Python] New pyarrow.Table.from_pydict() function
> -------------------------------------------------
>
>                 Key: ARROW-4032
>                 URL: https://issues.apache.org/jira/browse/ARROW-4032
>             Project: Apache Arrow
>          Issue Type: Task
>          Components: Python
>            Reporter: David Lee
>            Priority: Minor
>
> Here's a proposal to create a pyarrow.Table.from_pydict() function.
> Right now only pyarrow.Table.from_pandas() exist and there are inherit 
> problems using Pandas with NULL support for Int(s) and Boolean(s)
> [http://pandas.pydata.org/pandas-docs/version/0.23.4/gotchas.html]
> {{NaN}}, Integer {{NA}} values and {{NA}} type promotions:
> Sample python code on how this would work.
>  
> {code:java}
> import pyarrow as pa
> from datetime import datetime
> test_list = [
> {"name": "Tom", "age": 10},
> {"name": "Mark", "age": 5, "city": "San Francisco"},
> {"name": "Pam", "age": 7, "birthday": datetime.now()}
> ]
> def from_pydict(pylist, columns):
>     arrow_columns = list()
>     for column in columns:
>         arrow_columns.append(pa.array([v[column] if column in v else None for 
> v in pylist]))
>     arrow_table = pa.Table.from_arrays(arrow_columns, columns)
>     return arrow_table
> test = from_pydict(test_list, ['name' , 'age', 'city', 'birthday', 'dummy'])
> {code}
> Additional work would be needed to pass in a schema object if you want to 
> refine data types further. I think the existing code from from_pandas() to do 
> that would work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4032) [Python] New pyarrow.Table.from_pydict() function

Reply via email to