Just to give an update. I've been a little bit delayed, but my progress is as follows: 1. Had 1 PR merged that will exercise basic end-to-end tests. 2. Have another PR open that allows a configuration option in C++ to determine which algorithm version to use for reading/writing, the existing version and the new version supported complex-nested arrays. I think a large amount of code will be reused/delegated to but I will err on the side of not touching the existing code/algorithms so that any errors in the implementation or performance regressions can hopefully be mitigated at runtime. I expect in later releases (once the code has "baked") will become a no-op. 3. Started coding the write path.
Which leaves: 1. Finishing the write path (I estimate 2-3 weeks) to be code complete 2. Implementing the read path. Again, I'm happy to collaborate if people have bandwidth and want to contribute. Thanks, Micah On Thu, Jan 9, 2020 at 10:31 PM Micah Kornfield <emkornfi...@gmail.com> wrote: > Hi Wes, > I'm still interested in doing the work. But don't to hold anybody up if > they have bandwidth. > > In order to actually make progress on this, my plan will be to: > 1. Help with the current Java review backlog through early next week or > so (this has been taking the majority of my time allocated for Arrow > contributions for the last 6 months or so). > 2. Shift all my attention to trying to get this done (this means no > reviews other then closing out existing ones that I've started until it is > done). Hopefully, other Java committers can help shrink the backlog > further (Jacques thanks for you recent efforts here). > > Thanks, > Micah > > On Thu, Jan 9, 2020 at 8:16 AM Wes McKinney <wesmck...@gmail.com> wrote: > >> hi folks, >> >> I think we have reached a point where the incomplete C++ Parquet >> nested data assembly/disassembly is harming the value of several >> others parts of the project, for example the Datasets API. As another >> example, it's possible to ingest nested data from JSON but not write >> it to Parquet in general. >> >> Implementing the nested data read and write path completely is a >> difficult project requiring at least several weeks of dedicated work, >> so it's not so surprising that it hasn't been accomplished yet. I know >> that several people have expressed interest in working on it, but I >> would like to see if anyone would be able to volunteer a commitment of >> time and guess on a rough timeline when this work could be done. It >> seems to me if this slips beyond 2020 it will significant diminish the >> value being created by other parts of the project. >> >> Since I'm pretty familiar with all the Parquet code I'm one candidate >> person to take on this project (and I can dedicate the time, but it >> would come at the expense of other projects where I can also be >> useful). But Micah and others expressed interest in working on it, so >> I wanted to have a discussion about it to see what others think. >> >> Thanks >> Wes >> >