[ 
https://issues.apache.org/jira/browse/ARROW-17393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580174#comment-17580174
 ] 

Joris Van den Bossche commented on ARROW-17393:
-----------------------------------------------

The integer number you have here is too large to fit in the int64 range:

{code:python}
>>> np.iinfo("int64").max  # same as 2**(64-1)
9223372036854775807
{code}

So that means that this cannot be parsed in a pyarrow column of type int64. 
Explicitly asking for this would also give an error:

{code:python}
>>> pa.array([123451234512345123451234512], type="int64")
...
OverflowError: Python int too large to convert to C long
{code}

As a result, pyarrow falls back to parsing the number as float. But also here 
there are inherent limitations for what a float64 can represent, so it is 
expected that you cannot store such a number faithfully.

> [Python] pyarrow large integer conversion
> -----------------------------------------
>
>                 Key: ARROW-17393
>                 URL: https://issues.apache.org/jira/browse/ARROW-17393
>             Project: Apache Arrow
>          Issue Type: Bug
>            Reporter: Donald Freeman
>            Priority: Major
>
> I have a json document that looks like this. 
> {"number": 123451234512345123451234512}
> I then run the below code. 
> >>> from pyarrow.json import read_json
> >>> pyarrow_table = read_json('pyarrow_test.json')
> >>> pyarrow_table['number'][0].as_py().as_integer_ratio()
> (123451234512345125900779520, 1)
> notice the float that I get looks like it has been rounded or modified in 
> some way.
>  
> Am I reading this file incorrectly or is there an issue with the conversion 
> of this number?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to