[jira] [Commented] (ARROW-15146) ArrowInvalid: Integer value

Weston Pace (Jira) Wed, 02 Feb 2022 13:34:05 -0800


    [ 
https://issues.apache.org/jira/browse/ARROW-15146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17486115#comment-17486115
 ]


Weston Pace commented on ARROW-15146:
-------------------------------------

The original post is a little too vague to say for certain what is happening.  
As for [~mattcarothers]'s issue (thank you for the reproducible test case) the 
problem is that 42 is being interpreted as an int64 scalar.  So then when the 
filtering logic kicks in it is comparing a uint64 array with an int64 scalar 
and it decides to downcast the uint64.

A workaround is:

{code}
import pyarrow as pa
df = pd.read_parquet('test.parquet', filters=[('col1', '=', pa.scalar(42, 
type=pa.uint64()))])
{code}

I'm not entirely sure if this is a bug or not.  The casting logic is pretty 
complex as it is.  However, preferring to cast literals before casting arrays 
might be a reasonable rule (it also leads to better performance).

> ArrowInvalid: Integer value 
> ----------------------------
>
>                 Key: ARROW-15146
>                 URL: https://issues.apache.org/jira/browse/ARROW-15146
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Parquet
>    Affects Versions: 6.0.1
>         Environment: Ubuntu 20.04, PyArrow 6.01, Python 3.9
>            Reporter: mondonomo
>            Priority: Major
>
> I've created a parquet db with an uint64 datatype. When reading some of the 
> files are raising the errors like
> {quote}ArrowInvalid: Integer value 12120467241726599441 not in range: 0 to 
> 9223372036854775807
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (ARROW-15146) ArrowInvalid: Integer value

Reply via email to