[
https://issues.apache.org/jira/browse/ARROW-6720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Antoine Pitrou updated ARROW-6720:
----------------------------------
Labels: (was: pull-request-available)
> [JAVA][C++]Support Parquet Read and Write in Java
> -------------------------------------------------
>
> Key: ARROW-6720
> URL: https://issues.apache.org/jira/browse/ARROW-6720
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++, Java
> Affects Versions: 0.15.0
> Reporter: Chendi.Xue
> Assignee: Chendi.Xue
> Priority: Major
> Fix For: 6.0.0
>
> Time Spent: 38.5h
> Remaining Estimate: 0h
>
> We added a new java interface to support parquet read and write from hdfs or
> local file.
> The purpose of this implementation is that when we loading and dumping
> parquet data in Java, we can only use rowBased put and get methods. Since
> arrow already has C++ implementation to load and dump parquet, so we wrapped
> those codes as Java APIs.
> After test, we noticed in our workload, performance improved more than 2x
> comparing with rowBased load and dump. So we want to contribute codes to
> arrow.
> since this is a total independent change, there is no codes change to current
> arrow codes. We added two folders as listed: java/adapter/parquet and
> cpp/src/jni/parquet
--
This message was sent by Atlassian Jira
(v8.3.4#803005)