[
https://issues.apache.org/jira/browse/ARROW-15146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17486094#comment-17486094
]
Matt Carothers commented on ARROW-15146:
----------------------------------------
I have this issue as well. Specifically, if I write a parquet file with a
uint64 column with a value greater than 2**63 and then try to load it with a
filter, I get that error. Loading the file without a filter works as expected.
Looks like the validation is looking for 2**63 instead of 2**64. Here's some
quick code to reproduce the bug.
>>> import pandas as pd
>>> df = pd.DataFrame([ \{ 'col1' : 2**64 - 1 }, \{ 'col1' : 42 } ])
>>> df
col1
0 18446744073709551615
1 42
>>> df.to_parquet('test.parquet')
>>> df = pd.read_parquet('test.parquet', filters=[('col1', '=', 42)])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/tmp/pyarrow-test/lib/python3.8/site-packages/pandas/io/parquet.py",
line 493, in read_parquet
return impl.read(
File "/tmp/pyarrow-test/lib/python3.8/site-packages/pandas/io/parquet.py",
line 240, in read
result = self.api.parquet.read_table(
File "/tmp/pyarrow-test/lib/python3.8/site-packages/pyarrow/parquet.py", line
1941, in read_table
return dataset.read(columns=columns, use_threads=use_threads,
File "/tmp/pyarrow-test/lib/python3.8/site-packages/pyarrow/parquet.py", line
1776, in read
table = self._dataset.to_table(
File "pyarrow/_dataset.pyx", line 491, in pyarrow._dataset.Dataset.to_table
File "pyarrow/_dataset.pyx", line 3235, in pyarrow._dataset.Scanner.to_table
File "pyarrow/error.pxi", line 143, in
pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Integer value 18446744073709551615 not in range: 0 to
9223372036854775807
> ArrowInvalid: Integer value
> ----------------------------
>
> Key: ARROW-15146
> URL: https://issues.apache.org/jira/browse/ARROW-15146
> Project: Apache Arrow
> Issue Type: Bug
> Components: Parquet
> Affects Versions: 6.0.1
> Environment: Ubuntu 20.04, PyArrow 6.01, Python 3.9
> Reporter: mondonomo
> Priority: Major
>
> I've created a parquet db with an uint64 datatype. When reading some of the
> files are raising the errors like
> {quote}ArrowInvalid: Integer value 12120467241726599441 not in range: 0 to
> 9223372036854775807
> {quote}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)