[jira] [Commented] (ARROW-15146) ArrowInvalid: Integer value

Matt Carothers (Jira) Wed, 02 Feb 2022 12:57:46 -0800


    [ 
https://issues.apache.org/jira/browse/ARROW-15146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17486094#comment-17486094
 ]


Matt Carothers commented on ARROW-15146:
----------------------------------------

I have this issue as well. Specifically, if I write a parquet file with a 
uint64 column with a value greater than 2**63 and then try to load it with a 
filter, I get that error.  Loading the file without a filter works as expected. 
 Looks like the validation is looking for 2**63 instead of 2**64.  Here's some 
quick code to reproduce the bug.

>>> import pandas as pd
>>> df = pd.DataFrame([ \{ 'col1' : 2**64 - 1 }, \{ 'col1' : 42 } ])
>>> df
                   col1
0  18446744073709551615
1                    42
>>> df.to_parquet('test.parquet')
>>> df = pd.read_parquet('test.parquet', filters=[('col1', '=', 42)])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/pyarrow-test/lib/python3.8/site-packages/pandas/io/parquet.py", 
line 493, in read_parquet
    return impl.read(
  File "/tmp/pyarrow-test/lib/python3.8/site-packages/pandas/io/parquet.py", 
line 240, in read
    result = self.api.parquet.read_table(
  File "/tmp/pyarrow-test/lib/python3.8/site-packages/pyarrow/parquet.py", line 
1941, in read_table
    return dataset.read(columns=columns, use_threads=use_threads,
  File "/tmp/pyarrow-test/lib/python3.8/site-packages/pyarrow/parquet.py", line 
1776, in read
    table = self._dataset.to_table(
  File "pyarrow/_dataset.pyx", line 491, in pyarrow._dataset.Dataset.to_table
  File "pyarrow/_dataset.pyx", line 3235, in pyarrow._dataset.Scanner.to_table
  File "pyarrow/error.pxi", line 143, in 
pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Integer value 18446744073709551615 not in range: 0 to 
9223372036854775807

> ArrowInvalid: Integer value 
> ----------------------------
>
>                 Key: ARROW-15146
>                 URL: https://issues.apache.org/jira/browse/ARROW-15146
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Parquet
>    Affects Versions: 6.0.1
>         Environment: Ubuntu 20.04, PyArrow 6.01, Python 3.9
>            Reporter: mondonomo
>            Priority: Major
>
> I've created a parquet db with an uint64 datatype. When reading some of the 
> files are raising the errors like
> {quote}ArrowInvalid: Integer value 12120467241726599441 not in range: 0 to 
> 9223372036854775807
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (ARROW-15146) ArrowInvalid: Integer value

Reply via email to