[jira] [Comment Edited] (ARROW-15146) ArrowInvalid: Integer value

Matt Carothers (Jira) Wed, 02 Feb 2022 12:58:05 -0800


    [ 
https://issues.apache.org/jira/browse/ARROW-15146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17486094#comment-17486094
 ]


Matt Carothers edited comment on ARROW-15146 at 2/2/22, 8:57 PM:
-----------------------------------------------------------------

I have this issue as well. Specifically, if I write a parquet file with a 
uint64 column with a value greater than 2^63 and then try to load it with a 
filter, I get that error.  Loading the file without a filter works as expected. 
 Looks like the validation is looking for 2^63 instead of 2*64.  Here's some 
quick code to reproduce the bug.

{{>>> import pandas as pd}}
{{>>> df = pd.DataFrame([ \{ 'col1' : 2**64 - 1 }, \{ 'col1' : 42 } ])}}
{{>>> df}}
{{                   col1}}
{{0  18446744073709551615}}
{{1                    42}}
{{>>> df.to_parquet('test.parquet')}}
{{>>> df = pd.read_parquet('test.parquet', filters=[('col1', '=', 42)])}}
{{Traceback (most recent call last):}}
{{  File "<stdin>", line 1, in <module>}}
{{  File "/tmp/pyarrow-test/lib/python3.8/site-packages/pandas/io/parquet.py", 
line 493, in read_parquet}}
{{    return impl.read(}}
{{  File "/tmp/pyarrow-test/lib/python3.8/site-packages/pandas/io/parquet.py", 
line 240, in read}}
{{    result = self.api.parquet.read_table(}}
{{  File "/tmp/pyarrow-test/lib/python3.8/site-packages/pyarrow/parquet.py", 
line 1941, in read_table}}
{{    return dataset.read(columns=columns, use_threads=use_threads,}}
{{  File "/tmp/pyarrow-test/lib/python3.8/site-packages/pyarrow/parquet.py", 
line 1776, in read}}
{{    table = self._dataset.to_table(}}
{{  File "pyarrow/_dataset.pyx", line 491, in 
pyarrow._dataset.Dataset.to_table}}
{{  File "pyarrow/_dataset.pyx", line 3235, in 
pyarrow._dataset.Scanner.to_table}}
{{  File "pyarrow/error.pxi", line 143, in 
pyarrow.lib.pyarrow_internal_check_status}}
{{  File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status}}
{{pyarrow.lib.ArrowInvalid: Integer value 18446744073709551615 not in range: 0 
to 9223372036854775807}}


was (Author: JIRAUSER284600):
I have this issue as well. Specifically, if I write a parquet file with a 
uint64 column with a value greater than 2^63 and then try to load it with a 
filter, I get that error.  Loading the file without a filter works as expected. 
 Looks like the validation is looking for 2^63 instead of 2*64.  Here's some 
quick code to reproduce the bug.

>>> import pandas as pd
>>> df = pd.DataFrame([ \{ 'col1' : 2**64 - 1 }, \{ 'col1' : 42 } ])
>>> df
                   col1
0  18446744073709551615
1                    42
>>> df.to_parquet('test.parquet')
>>> df = pd.read_parquet('test.parquet', filters=[('col1', '=', 42)])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/pyarrow-test/lib/python3.8/site-packages/pandas/io/parquet.py", 
line 493, in read_parquet
    return impl.read(
  File "/tmp/pyarrow-test/lib/python3.8/site-packages/pandas/io/parquet.py", 
line 240, in read
    result = self.api.parquet.read_table(
  File "/tmp/pyarrow-test/lib/python3.8/site-packages/pyarrow/parquet.py", line 
1941, in read_table
    return dataset.read(columns=columns, use_threads=use_threads,
  File "/tmp/pyarrow-test/lib/python3.8/site-packages/pyarrow/parquet.py", line 
1776, in read
    table = self._dataset.to_table(
  File "pyarrow/_dataset.pyx", line 491, in pyarrow._dataset.Dataset.to_table
  File "pyarrow/_dataset.pyx", line 3235, in pyarrow._dataset.Scanner.to_table
  File "pyarrow/error.pxi", line 143, in 
pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Integer value 18446744073709551615 not in range: 0 to 
9223372036854775807

> ArrowInvalid: Integer value 
> ----------------------------
>
>                 Key: ARROW-15146
>                 URL: https://issues.apache.org/jira/browse/ARROW-15146
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Parquet
>    Affects Versions: 6.0.1
>         Environment: Ubuntu 20.04, PyArrow 6.01, Python 3.9
>            Reporter: mondonomo
>            Priority: Major
>
> I've created a parquet db with an uint64 datatype. When reading some of the 
> files are raising the errors like
> {quote}ArrowInvalid: Integer value 12120467241726599441 not in range: 0 to 
> 9223372036854775807
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (ARROW-15146) ArrowInvalid: Integer value

Reply via email to