[
https://issues.apache.org/jira/browse/ARROW-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317406#comment-17317406
]
David Li commented on ARROW-9697:
---------------------------------
I'm taking a swing at this and it'll be up once ARROW-11797 lands. Note that
Joris correctly guesses that the Parquet reader indeed implements the
optimization internally; there's no need for a special method as the Parquet
reader will just fabricate a batch if it notices you aren't reading any columns.
> [C++][Dataset] num_rows method for Dataset/Scanner
> --------------------------------------------------
>
> Key: ARROW-9697
> URL: https://issues.apache.org/jira/browse/ARROW-9697
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++
> Reporter: Neal Richardson
> Assignee: David Li
> Priority: Major
> Labels: dataset
> Fix For: 4.0.0
>
>
> Something like Scanner::ToTable except first Project to keep 0 columns, and
> for each record batch, grab the num_rows. Then sum the resulting vector.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)