[
https://issues.apache.org/jira/browse/ARROW-7905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17103908#comment-17103908
]
Micah Kornfield commented on ARROW-7905:
----------------------------------------
There are a few areas I've been mulling over:
1. Trying to do better to separate our pure parquet reading/writing from Arrow
Reading/Writing (if this is important).
2. Better APIs for incremental reading to Arrow (I think there are some TODOs
sprinkled around places).
3. Better APIs to exposing RLE information (and making use of it when
translating to arrow). For example [https://github.com/apache/arrow/pull/7143]
starts to make better use of RLE but could potentially be better if we didn't
have to reconstruct runs based on null (note there is currently a performance
regression which I need to fix).
> [Go][Parquet] Port the C++ Parquet implementation to Go
> -------------------------------------------------------
>
> Key: ARROW-7905
> URL: https://issues.apache.org/jira/browse/ARROW-7905
> Project: Apache Arrow
> Issue Type: New Feature
> Components: Go
> Reporter: Nick Poorman
> Assignee: Nick Poorman
> Priority: Minor
> Labels: Go, Parquet, golang
> Time Spent: 88.2h
> Remaining Estimate: 36h 23m
>
> I’m currently in the progress of porting the C++ version of Parquet in the
> Apache Arrow project to Golang. Many projects and companies have been and are
> building their data lakes and persistence layer using Parquet. Apache Spark
> uses it heavily for persistence (including Databricks DeltaLake).
> To me this is the missing component for people to truly begin using the Go
> implementation of Arrow with any existing data architectures.
> If you have any interest in this project, give this issue a watch as it will
> keep me motivated to finish the port. Also, if you have specific use cases
> feel free to drop them in here so I can keep them in mind as I continue with
> the port.
> Things with the code base are rather in flux at the moment as I figure out
> how to solve various nuances between the features of C++ and Go. As soon as I
> have a solid chunk of the port working, I’ll create a PR in the Apache Arrow
> project on Github and let everyone know in here.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)