Hi Wes, I'm still interested in doing the work. But don't to hold anybody up if they have bandwidth.
In order to actually make progress on this, my plan will be to: 1. Help with the current Java review backlog through early next week or so (this has been taking the majority of my time allocated for Arrow contributions for the last 6 months or so). 2. Shift all my attention to trying to get this done (this means no reviews other then closing out existing ones that I've started until it is done). Hopefully, other Java committers can help shrink the backlog further (Jacques thanks for you recent efforts here). Thanks, Micah On Thu, Jan 9, 2020 at 8:16 AM Wes McKinney <wesmck...@gmail.com> wrote: > hi folks, > > I think we have reached a point where the incomplete C++ Parquet > nested data assembly/disassembly is harming the value of several > others parts of the project, for example the Datasets API. As another > example, it's possible to ingest nested data from JSON but not write > it to Parquet in general. > > Implementing the nested data read and write path completely is a > difficult project requiring at least several weeks of dedicated work, > so it's not so surprising that it hasn't been accomplished yet. I know > that several people have expressed interest in working on it, but I > would like to see if anyone would be able to volunteer a commitment of > time and guess on a rough timeline when this work could be done. It > seems to me if this slips beyond 2020 it will significant diminish the > value being created by other parts of the project. > > Since I'm pretty familiar with all the Parquet code I'm one candidate > person to take on this project (and I can dedicate the time, but it > would come at the expense of other projects where I can also be > useful). But Micah and others expressed interest in working on it, so > I wanted to have a discussion about it to see what others think. > > Thanks > Wes >