giacomo created ARROW-5169: ------------------------------ Summary: non-nullable fields are converted to nullable in {{Table.from_pandas}} Key: ARROW-5169 URL: https://issues.apache.org/jira/browse/ARROW-5169 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.13.0 Reporter: giacomo
In version 0.13.0, the {{Table.from_pandas}} function modifies the input schema by making all non-nullable types nullable. This can cause problems for example with this code: {code} df = pd.DataFrame(list(range(200)), columns=['numcol']) schema = pa.schema([ pa.field('numcol', pa.int64(), nullable=False), ]) writer = pq.ParquetWriter(io.BytesIO(), schema, version='2.0') table = pa.Table.from_pandas(df, schema=schema) writer.write_table(table) {code} Which fails due to the writer schema and the table schema being different. I believe the direct cause could be [https://github.com/apache/arrow/blob/master/python/pyarrow/table.pxi#L622] where nullable is set to True by default, resulting in the table schema being modified. Thanks for your valuable work on this library. Giacomo -- This message was sent by Atlassian JIRA (v7.6.3#76005)