[
https://issues.apache.org/jira/browse/ARROW-15977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17509736#comment-17509736
]
Alessandro Molina commented on ARROW-15977:
-------------------------------------------
I think that the problem might be that we are not propagating the conversion
options. At least for structs that seems to be happening here:
[https://github.com/apache/arrow/blob/06d0e650080a262146cbe5cdba2598390625ed1f/cpp/src/arrow/python/numpy_to_arrow.cc#L767-L768]
the {{cast_options}} argument is not provided (
[https://github.com/apache/arrow/blob/06d0e650080a262146cbe5cdba2598390625ed1f/cpp/src/arrow/python/numpy_to_arrow.cc#L183-L185]
)
I'll need to investigate more were the ListType behaviour is coming from
instead as that doesn't seem to be visitable by the {{NumPyConverter}}
> [Python] Can't ignore the overflow error.
> -----------------------------------------
>
> Key: ARROW-15977
> URL: https://issues.apache.org/jira/browse/ARROW-15977
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 7.0.0
> Reporter: taichi kato
> Priority: Major
>
> I know that the argument safe=False in pa.Table.from_pandas ignores overflow
> errors, but it does not ignore overflow in list or in struct.
> It works.
> {code:java}
> import pyarrow as pa
> import pyarrow.parquet as pq
> import pandas as pd
> import json
> test_json = [
> {
> "name": "taro",
> "id": 3046682132,
> "points": [2, 2, 2],
> "groups": {
> "group_name": "baseball",
> "group_id": 1234
> }
> },
> {
> "name": "taro",
> "id": 1234,
> }
> ]
> schema = pa.schema([
> pa.field('name', pa.string()),
> pa.field('id', pa.int32()),
> pa.field("points", pa.list_(pa.int32())),
> pa.field('groups', pa.struct([
> pa.field("group_name", pa.string()),
> pa.field("group_id", pa.int32()),
> ])),
> ])
> writer = pq.ParquetWriter('test_schema.parquet', schema=schema)
> df = pd.DataFrame(test_json)
> table = pa.Table.from_pandas(df, schema=schema, safe=False)
> writer.write_table(table)
> writer.close()table = pq.read_table("test_schema.parquet")
> print(table) {code}
> {code:java}
> name: [["taro","taro"]] id: [[-1248285164,1234]] points: [[[2,2,2],null]]
> groups: [ – is_valid: [ true, false ] – child 0 type: string [ "baseball",
> null ] – child 1 type: int32 [ 1234, null ]]
> {code}
> However, the following two do not work.
>
> {code:java}
> test_json = [
> {
> "name": "taro",
> "id": 2,
> "points": [2, 3046682132, 2],
> "groups": {
> "group_name": "baseball",
> "group_id": 1234
> }
> },
> {
> "name": "taro",
> "id": 1234,
> }
> ]{code}
> {code:java}
> Traceback (most recent call last):
> File "test_pyarrow.py", line 35, in <module>
> table = pa.Table.from_pandas(df, schema=schema, safe=False)
> File "pyarrow/table.pxi", line 1782, in pyarrow.lib.Table.from_pandas
> File
> "/home/s0108403058/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pyarrow/pandas_compat.py",
> line 594, in dataframe_to_arrays
> arrays = [convert_column(c, f)
> File
> "/home/s0108403058/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pyarrow/pandas_compat.py",
> line 594, in <listcomp>
> arrays = [convert_column(c, f)
> File
> "/home/s0108403058/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pyarrow/pandas_compat.py",
> line 581, in convert_column
> raise e
> File
> "/home/s0108403058/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pyarrow/pandas_compat.py",
> line 575, in convert_column
> result = pa.array(col, type=type_, from_pandas=True, safe=safe)
> File "pyarrow/array.pxi", line 312, in pyarrow.lib.array
> File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
> File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: ('Value 3046682132 too large to fit in C integer
> type', 'Conversion failed for column points with type object') {code}
> {code:java}
> test_json = [
> {
> "name": "taro",
> "id": 2,
> "points": [2, 2, 2],
> "groups": {
> "group_name": "baseball",
> "group_id": 3046682132
> }
> },
> {
> "name": "taro",
> "id": 1234,
> }
> ] {code}
> {code:java}
> Traceback (most recent call last):
> File "test_pyarrow.py", line 35, in <module>
> table = pa.Table.from_pandas(df, schema=schema, safe=False)
> File "pyarrow/table.pxi", line 1782, in pyarrow.lib.Table.from_pandas
> File
> "/home/s0108403058/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pyarrow/pandas_compat.py",
> line 594, in dataframe_to_arrays
> arrays = [convert_column(c, f)
> File
> "/home/s0108403058/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pyarrow/pandas_compat.py",
> line 594, in <listcomp>
> arrays = [convert_column(c, f)
> File
> "/home/s0108403058/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pyarrow/pandas_compat.py",
> line 581, in convert_column
> raise e
> File
> "/home/s0108403058/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pyarrow/pandas_compat.py",
> line 575, in convert_column
> result = pa.array(col, type=type_, from_pandas=True, safe=safe)
> File "pyarrow/array.pxi", line 312, in pyarrow.lib.array
> File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
> File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: ('Value 3046682132 too large to fit in C integer
> type', 'Conversion failed for column groups with type object') {code}
> Could you please fix this bug?
> pyarrow==7.0.0
--
This message was sent by Atlassian Jira
(v8.20.1#820001)