[ https://issues.apache.org/jira/browse/ARROW-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721745#comment-16721745 ]
David Lee commented on ARROW-4032: ---------------------------------- Updated the sample code to include Schema and Safe options.. Passing in a schema will allow conversions from microseconds to milliseconds. > [Python] New pyarrow.Table.from_pydict() function > ------------------------------------------------- > > Key: ARROW-4032 > URL: https://issues.apache.org/jira/browse/ARROW-4032 > Project: Apache Arrow > Issue Type: Task > Components: Python > Reporter: David Lee > Priority: Minor > > Here's a proposal to create a pyarrow.Table.from_pydict() function. > Right now only pyarrow.Table.from_pandas() exist and there are inherit > problems using Pandas with NULL support for Int(s) and Boolean(s) > [http://pandas.pydata.org/pandas-docs/version/0.23.4/gotchas.html] > {{NaN}}, Integer {{NA}} values and {{NA}} type promotions: > Sample python code on how this would work. > > {code:java} > import pyarrow as pa > from datetime import datetime > # convert microseconds to milliseconds. More support for MS in parquet. > today = datetime.now() > today = datetime(today.year, today.month, today.day, today.hour, > today.minute, today.second, today.microsecond - today.microsecond % 1000) > pylist = [ > {"name": "Tom", "age": 10}, > {"name": "Mark", "age": 5, "city": "San Francisco"}, > {"name": "Pam", "age": 7, "birthday": today} > ] > def from_pydict(pylist, schema=None, columns=None, safe=True): > arrow_columns = list() > if schema: > columns = schema.names > if not columns: > return > for column in columns: > arrow_columns.append(pa.array([v[column] if column in v else None for v in > pylist])) > arrow_table = pa.Table.from_arrays(arrow_columns, columns) > if schema: > arrow_table = arrow_table.cast(schema, safe=safe) > return arrow_table > test = from_pydict(pylist, columns=['name' , 'age', 'city', 'birthday', > 'dummy']) > test_schema = pa.schema([ > pa.field('name', pa.string()), > pa.field('age', pa.int16()), > pa.field('city', pa.string()), > pa.field('birthday', pa.timestamp('ms')) > ]) > test2 = from_pydict(pylist, schema=test_schema) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)