[jira] [Commented] (ARROW-15977) [Python] Can't ignore the overflow error.

Alessandro Molina (Jira) Mon, 21 Mar 2022 03:04:06 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-15977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17509736#comment-17509736
 ]


Alessandro Molina commented on ARROW-15977:
-------------------------------------------

I think that the problem might be that we are not propagating the conversion 
options. At least for structs that seems to be happening here: 
[https://github.com/apache/arrow/blob/06d0e650080a262146cbe5cdba2598390625ed1f/cpp/src/arrow/python/numpy_to_arrow.cc#L767-L768]
 the {{cast_options}} argument is not provided ( 
[https://github.com/apache/arrow/blob/06d0e650080a262146cbe5cdba2598390625ed1f/cpp/src/arrow/python/numpy_to_arrow.cc#L183-L185]
 )

 

I'll need to investigate more were the ListType behaviour is coming from 
instead as that doesn't seem to be visitable by the {{NumPyConverter}}

> [Python] Can't ignore the overflow error.
> -----------------------------------------
>
>                 Key: ARROW-15977
>                 URL: https://issues.apache.org/jira/browse/ARROW-15977
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 7.0.0
>            Reporter: taichi kato
>            Priority: Major
>
> I know that the argument safe=False in pa.Table.from_pandas ignores overflow 
> errors, but it does not ignore overflow in list or in struct.
> It works.
> {code:java}
> import pyarrow as pa
> import pyarrow.parquet as pq
> import pandas as pd
> import json
> test_json = [
>     {
>         "name": "taro",
>         "id": 3046682132,
>         "points": [2, 2, 2],
>         "groups": {
>             "group_name": "baseball", 
>             "group_id": 1234
>         }
>     },
>     { 
>         "name": "taro",
>         "id": 1234, 
>     }
> ]
> schema = pa.schema([
>     pa.field('name', pa.string()),
>     pa.field('id', pa.int32()),
>     pa.field("points", pa.list_(pa.int32())),
>     pa.field('groups', pa.struct([
>         pa.field("group_name", pa.string()),
>         pa.field("group_id", pa.int32()),
>     ])),
> ])
> writer = pq.ParquetWriter('test_schema.parquet', schema=schema)
> df = pd.DataFrame(test_json)
> table = pa.Table.from_pandas(df, schema=schema, safe=False)
> writer.write_table(table)
> writer.close()table = pq.read_table("test_schema.parquet")
> print(table) {code}
> {code:java}
> name: [["taro","taro"]] id: [[-1248285164,1234]] points: [[[2,2,2],null]] 
> groups: [ – is_valid: [ true, false ] – child 0 type: string [ "baseball", 
> null ] – child 1 type: int32 [ 1234, null ]]
> {code}
> However, the following two do not work.
>  
> {code:java}
> test_json = [
>     {
>         "name": "taro",
>         "id": 2,
>         "points": [2, 3046682132, 2],
>         "groups": {
>             "group_name": "baseball", 
>             "group_id": 1234
>         }
>     },
>     { 
>         "name": "taro",
>         "id": 1234, 
>     }
> ]{code}
> {code:java}
> Traceback (most recent call last):
> File "test_pyarrow.py", line 35, in <module>
> table = pa.Table.from_pandas(df, schema=schema, safe=False)
> File "pyarrow/table.pxi", line 1782, in pyarrow.lib.Table.from_pandas
> File 
> "/home/s0108403058/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pyarrow/pandas_compat.py",
>  line 594, in dataframe_to_arrays
> arrays = [convert_column(c, f)
> File 
> "/home/s0108403058/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pyarrow/pandas_compat.py",
>  line 594, in <listcomp>
> arrays = [convert_column(c, f)
> File 
> "/home/s0108403058/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pyarrow/pandas_compat.py",
>  line 581, in convert_column
> raise e
> File 
> "/home/s0108403058/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pyarrow/pandas_compat.py",
>  line 575, in convert_column
> result = pa.array(col, type=type_, from_pandas=True, safe=safe)
> File "pyarrow/array.pxi", line 312, in pyarrow.lib.array
> File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
> File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: ('Value 3046682132 too large to fit in C integer 
> type', 'Conversion failed for column points with type object') {code}
> {code:java}
> test_json = [
>     {
>         "name": "taro",
>         "id": 2,
>         "points": [2, 2, 2],
>         "groups": {
>             "group_name": "baseball", 
>             "group_id": 3046682132
>         }
>     },
>     { 
>         "name": "taro",
>         "id": 1234, 
>     }
> ] {code}
> {code:java}
> Traceback (most recent call last):
> File "test_pyarrow.py", line 35, in <module>
> table = pa.Table.from_pandas(df, schema=schema, safe=False)
> File "pyarrow/table.pxi", line 1782, in pyarrow.lib.Table.from_pandas
> File 
> "/home/s0108403058/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pyarrow/pandas_compat.py",
>  line 594, in dataframe_to_arrays
> arrays = [convert_column(c, f)
> File 
> "/home/s0108403058/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pyarrow/pandas_compat.py",
>  line 594, in <listcomp>
> arrays = [convert_column(c, f)
> File 
> "/home/s0108403058/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pyarrow/pandas_compat.py",
>  line 581, in convert_column
> raise e
> File 
> "/home/s0108403058/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pyarrow/pandas_compat.py",
>  line 575, in convert_column
> result = pa.array(col, type=type_, from_pandas=True, safe=safe)
> File "pyarrow/array.pxi", line 312, in pyarrow.lib.array
> File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
> File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: ('Value 3046682132 too large to fit in C integer 
> type', 'Conversion failed for column groups with type object') {code}
> Could you please fix this bug?
> pyarrow==7.0.0



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (ARROW-15977) [Python] Can't ignore the overflow error.

Reply via email to