[jira] [Commented] (ARROW-13028) [C++] CSV add convert option to attempt 32bit number inferences

Nate Clark (Jira) Mon, 27 Sep 2021 07:15:09 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420767#comment-17420767
 ]


Nate Clark commented on ARROW-13028:
------------------------------------

I agree that largest type could be considered safest especially for floating 
point.
In theory it could start at int8 and work from there is any interest in that, 
but signed vs unsigned is probably not as beneficial.
For floating point the detection is more difficult since it is already 
considered an imprecise format so parsers will force values to fit to the size 
and detection of double vs float would have to be done outside the parser.
I did put out the linked MR for int32 detection so that you can see at least 
that implemented.

> [C++] CSV add convert option to attempt 32bit number inferences
> ---------------------------------------------------------------
>
>                 Key: ARROW-13028
>                 URL: https://issues.apache.org/jira/browse/ARROW-13028
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Nate Clark
>            Assignee: Nate Clark
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> When types are being inferred by CSV the numbers are always 64 bit. For large 
> data sets it could be better to use 32 bit types to save over all memory. To 
> do this it would be useful to add an option to ConvertOptions to try 32 bit 
> numbers before 64 bit. By default this option would be disabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-13028) [C++] CSV add convert option to attempt 32bit number inferences

Reply via email to