[
https://issues.apache.org/jira/browse/ARROW-7808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17039672#comment-17039672
]
Hongze Zhang commented on ARROW-7808:
-------------------------------------
I am not pretty sure but based on the mail discussion I would think of mapping
1 or 2 methods via JNI is not final solution but something we can get started
with. And, as for format Parquet, users may need access to different Datasets
layers such as DataFragments for Parquet files, ScanTasks for RowGroups, even
one may need to decide if C++ level post-scan filter should be
enabled/disabled, if partition filter should be applied, and so on. One or two
methods can not cover all of this.
And maintaining a JNI-based Datasets API may not be a heavy workload, because
on Java side, things are just mirrored to some basical Datasets concepts like
DataSource, DataFragment, and should keep away from re-implementing low-level
logic like scaning, projecting, filtering, etc. But everything in C++ could be
available in Java which is important to many users.
> [Java][Dataset] Implement Datasets Java API
> --------------------------------------------
>
> Key: ARROW-7808
> URL: https://issues.apache.org/jira/browse/ARROW-7808
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++ - Dataset, Java
> Reporter: Hongze Zhang
> Priority: Major
> Labels: dataset
>
> Porting following C++ Datasets APIs to Java:
> * DataSource
> * DataSourceDiscovery
> * DataFragment
> * Dataset
> * Scanner
> * ScanTask
> * ScanOptions
--
This message was sent by Atlassian Jira
(v8.3.4#803005)