[
https://issues.apache.org/jira/browse/ARROW-10439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403455#comment-17403455
]
Weston Pace commented on ARROW-10439:
-------------------------------------
https://github.com/apache/arrow/pull/10955 (as part of ARROW-13650) adds a
`max_rows_per_file` option. Max bytes is a little trickier (table.nbytes is
the in-memory size and I assume one would want the on-disk size) although
doable (the file writer's should be able to keep track of how many bytes
they've written but they don't do this today.) I'd prefer to avoid max bytes
unless someone has a need for it though.
> [C++][Dataset] Add max file size as a dataset writing option
> ------------------------------------------------------------
>
> Key: ARROW-10439
> URL: https://issues.apache.org/jira/browse/ARROW-10439
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Affects Versions: 2.0.0
> Reporter: Ben Kietzman
> Assignee: Weston Pace
> Priority: Major
> Labels: beginner, dataset, query-engine
> Fix For: 6.0.0
>
>
> This should be specified as a row limit.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)