[
https://issues.apache.org/jira/browse/ARROW-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869274#comment-16869274
]
Joris Van den Bossche commented on ARROW-2136:
----------------------------------------------
You can also run into this when using {{pa.Table.from_arrays}}
{code}
In [2]: schema = pa.schema([pa.field("a", pa.float64(), nullable=False)])
In [4]: table = pa.Table.from_arrays([pa.array([1.5, None])], schema=schema)
In [5]: table.schema
Out[5]: a: double
In [6]: table.schema.field_by_name('a')
Out[6]: pyarrow.Field<a: double not null>
In [7]: table.column('a')
Out[7]:
<Column name='a' type=DataType(double)>
[
[
1.5,
null
]
]
{code}
Under the hood, this function is doing {{Column(field, array)}}, and the Column
constructor is assuming the field datatype matches the array's datatype.
There is a {{Column::ValidateData()}}, but looking at the implementation, that
only checks that the chunks' types equal the field type, and does not check any
metadata such as nullability (although the doc comment says "Verify that the
column's array data is consistent with the passed field's metadata")
> [Python] Non-nullable schema fields not checked in conversions from pandas
> --------------------------------------------------------------------------
>
> Key: ARROW-2136
> URL: https://issues.apache.org/jira/browse/ARROW-2136
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.8.0
> Reporter: Matthew Gilbert
> Assignee: Joris Van den Bossche
> Priority: Major
> Fix For: 0.14.0
>
>
> If you provide a schema with {{nullable=False}} but pass a {{DataFrame}}
> which in fact has nulls it appears the schema is ignored? I would expect an
> error here.
> {code}
> import pyarrow as pa
> import pandas as pd
> df = pd.DataFrame({"a":[1.2, 2.1, pd.np.NaN]})
> schema = pa.schema([pa.field("a", pa.float64(), nullable=False)])
> table = pa.Table.from_pandas(df, schema=schema)
> table[0]
> <pyarrow.lib.Column object at 0x7f213bf2fb70>
> chunk 0: <pyarrow.lib.DoubleArray object at 0x7f213bf20ea8>
> [
> 1.2,
> 2.1,
> NA
> ]
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)