[
https://issues.apache.org/jira/browse/ARROW-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302561#comment-17302561
]
David Li commented on ARROW-8631:
---------------------------------
Following on from ARROW-9749, we now have ConvertOptions. ReadOptions doesn't
fit neatly into either the dataset-global or scan-specific buckets (skip_rows,
column names are in the former; block_size is in the latter) so fields will
have to be inlined (or it could be made part of CsvFileFormat with an explicit
block_size field in CsvFragmentScanOptions).
I'll also plumb through the options into Python and add a ScannerBuilder method
for setting them.
> [C++][Dataset] Add ConvertOptions and ReadOptions to CsvFileFormat
> ------------------------------------------------------------------
>
> Key: ARROW-8631
> URL: https://issues.apache.org/jira/browse/ARROW-8631
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Affects Versions: 0.17.0
> Reporter: Ben Kietzman
> Assignee: David Li
> Priority: Major
> Labels: dataset, pull-request-available
> Fix For: 4.0.0
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> https://github.com/apache/arrow/pull/7033 does not add ConvertOptions
> (including alternate spellings for null/true/false, etc) or ReadOptions
> (block_size, column name customization, etc). These will be helpful but will
> require some discussion to find the optimal way to integrate them with
> dataset::
--
This message was sent by Atlassian Jira
(v8.3.4#803005)