> > I'm not saying that I think the work definitely will not be completed, > but rather that we should put a date on the calendar as the target > date for 1.0.0 and stick to it. If the work gets done, that's great.
I'm okay, with this as well, I just want to make sure we have the a path forward to ultimately get two reference implementations. On Wed, Apr 22, 2020 at 5:50 AM Wes McKinney <[email protected]> wrote: > hi Micah, > > I'm not saying that I think the work definitely will not be completed, > but rather that we should put a date on the calendar as the target > date for 1.0.0 and stick to it. If the work gets done, that's great. > > 10 to 12 weeks from now would mean releasing 1.0.0 either the week of > June 29 or July 6. That is about 1 year since we discussed and adopted > our SemVer policy [1] > > > I would propose that if there isn't an implementation in any language we > > might drop it as part of the specification. The main feature that I > think > > meets this criteria is the Dictionary of Dictionary columns (Is this > > supported in C++)? > > I don't have a strong view on this, but IIUC this is implemented in > JavaScript and probably not far off in C++. > > - Wes > > [1]: > https://lists.apache.org/thread.html/2a630234214e590eb184c24bbf9dac4a8d8f7677d85a75fa49d70ba8%40%3Cdev.arrow.apache.org%3E > > On Wed, Apr 22, 2020 at 12:26 AM Micah Kornfield <[email protected]> > wrote: > > > > Hi Wes, > > I think we might be closer than we think on the Java side to having the > > functionality listed (I've added comments inline at the end with the > > features you listed in the original e-mail). > > > > My biggest concern is I don't think there is a clear path forward for > > Sparse Unions. Getting compatibility for Sparse unions would be more > > invasive/breaking changes to the java code base. [1] is the last thread > on > > the issue. I sadly have not had time to get back to this, nor will I > > probably have time before the next release. > > > > I would propose that if there isn't an implementation in any language we > > might drop it as part of the specification. The main feature that I > think > > meets this criteria is the Dictionary of Dictionary columns (Is this > > supported in C++)? > > > > Thanks, > > Micah > > > > > > * custom_metadata fields > > > > Not sure about this one. > > > > > * Extension Types > > > > There is an implementation already in Java, probably. needs more work for > > integration testing. > > > > * Large (64-bit offset) variable size types > > > > there is an open PR for string/binary types. LargeList is of more > > questionable value until Java supports vectors/arrays with more than 2^32 > > elements. > > > > * Delta and Replacement Dictionaries > > > > There is an implementation already in Java, probably needs more work for > > specifically for integration testing. > > > > > * Unions > > > > There is an implementation for dense unions (likely needs more work for > > integration testing). > > > > On Tue, Apr 21, 2020 at 11:26 AM Neal Richardson < > > [email protected]> wrote: > > > > > I'm all for making our next release be 1.0. Everything is about > tradeoffs, > > > and while I too would like to see a complete Java implementation, I > think > > > the costs of further delaying 1.0 outweigh the benefits of holding it > > > indefinitely in hopes that there will be enough availability of Java > > > developers to finish integration testing. > > > > > > Neal > > > > > > On Tue, Apr 21, 2020 at 10:55 AM Wes McKinney <[email protected]> > wrote: > > > > > > > hi Bryan -- with the way that things are going, if we were to block > > > > the 1.0.0 release on completing the Java work, it could be a very > long > > > > time to wait (long time = more than 6 months from now). I don't think > > > > that's acceptable. The Versioning document was formally adopted last > > > > August and so a year will have soon elapsed since we previously said > > > > we wanted to have everything integration tested. > > > > > > > > With what I'm proposing the primary things that would not be tested > > > > (if no progress in Java): > > > > > > > > * custom_metadata fields > > > > * Extension Types > > > > * Large (64-bit offset) variable size types > > > > * Delta and Replacement Dictionaries > > > > * Unions > > > > > > > > These do not seem like huge sacrifices, or at least not ones that > > > > compromise the stability of the columnar format. Of course, if some > of > > > > them are completed in the next 10-12 weeks, then that's great. > > > > > > > > - Wes > > > > > > > > On Tue, Apr 21, 2020 at 12:12 PM Bryan Cutler <[email protected]> > wrote: > > > > > > > > > > I really would like to see a 1.0.0 release with complete > > > implementations > > > > > for C++ and Java. From my experience, that interoperability has > been a > > > > > major selling point for the project. That being said, my time for > > > > > contributions has been pretty limited lately and I know that Java > has > > > > been > > > > > lagging, so if the rest of the community would like to push forward > > > with > > > > a > > > > > reduced scope, that is okay with me. I'll still continue to do > what I > > > can > > > > > on Java to fill in the gaps. > > > > > > > > > > Bryan > > > > > > > > > > On Tue, Apr 21, 2020 at 8:47 AM Wes McKinney <[email protected]> > > > > wrote: > > > > > > > > > > > Hi all -- are there some opinions about this? > > > > > > > > > > > > Thanks > > > > > > > > > > > > On Thu, Apr 16, 2020 at 5:30 PM Wes McKinney < > [email protected]> > > > > wrote: > > > > > > > > > > > > > > hi folks, > > > > > > > > > > > > > > Previously we had discussed a plan for making a 1.0.0 release > based > > > > on > > > > > > > completeness of columnar format integration tests and making > > > > > > > forward/backward compatibility guarantees as formalized in > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/arrow/blob/master/docs/source/format/Versioning.rst > > > > > > > > > > > > > > In particular, we wanted to demonstrate comprehensive Java/C++ > > > > > > interoperability. > > > > > > > > > > > > > > As time has passed we have stalled out a bit on completing > > > > integration > > > > > > > tests for the "long tail" of data types and columnar format > > > features. > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.google.com/spreadsheets/d/1Yu68rn2XMBpAArUfCOP9LC7uHb06CQrtqKE5vQ4bQx4/edit?usp=sharing > > > > > > > > > > > > > > As such I wanted to propose a reduction in scope so that we can > > > make > > > > a > > > > > > > 1.0.0 release sooner. The plan would be as follows: > > > > > > > > > > > > > > * Endeavor to have integration tests implemented and working > in at > > > > > > > least one reference implementation (likely to be the C++ > library). > > > It > > > > > > > seems important to verify that what's in Columnar.rst is able > to be > > > > > > > unambiguously implemented. > > > > > > > * Indicate in Versioning.rst or another place in the > documentation > > > > the > > > > > > > list of data types or advanced columnar format features (like > > > > > > > delta/replacement dictionaries) that are not yet fully > integration > > > > > > > tested. > > > > > > > > > > > > > > Some of the essential protocol stability details and all of the > > > most > > > > > > > commonly used data types have been stable for a long time now, > > > > > > > particularly after the recent alignment change. The current > list of > > > > > > > features that aren't being tested for cross-implementation > > > > > > > compatibility should not pose risk to downstream users. > > > > > > > > > > > > > > Thoughts about this? The 1.0.0 release is an important > milestone > > > for > > > > > > > the project and will help build continued momentum in > developer and > > > > > > > user community growth. > > > > > > > > > > > > > > Thanks > > > > > > > Wes > > > > > > > > > > > > > >
