More design work is needed on the metadata specification -- I would not use either codebase as a reference for doing a different implementation until the specification / format documents are complete enough to enable a "clean room" implementation.
I will leave Steven or Jacques from the Drill side to comment on ways to jump into the Java code. - Wes On Fri, Jun 10, 2016 at 1:27 PM, Kiril Menshikov <kmenshi...@gmail.com> wrote: > Hi Wes, > > What is the most complete arrow version at the moment? I can see C++ and > Python are most active and Java was coped from Drill. So does this mean that > we can use the C++ version as a reference? > > I also want to help you. I can do metadata read/write, if nobody doing it. > > Thanks, > -Kiril > >> On Jun 9, 2016, at 21:24, Wes McKinney <wesmck...@gmail.com> wrote: >> >> Since we are at the "chicken" stage of the chicken-and-egg problem I >> don't have straightforward guidance about how to proceed, other than >> to dig in to either the current Java and / or C++ codebases and >> helping sort out what needs to be done. It may be beneficial on the >> mailing list to discuss the incremental steps required to reach >> working integration tests (I suspect there will be many JIRAs / >> patches required to get there) -- defining these tasks (perhaps in a >> shared Google document) and creating the associated JIRAs is a >> valuable and necessary exercise. >> >> (Personally I've been most interested in "up stack" integration with >> other projects like Apache Parquet as it relates to native code >> consumers (e.g. Python libraries).) >> >> On the IPC side, you can look at the internal IPC round trip C++ tests: >> >> https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/ipc-adapter-test.cc >> >> On the Java side, an initial task would be to create a testing setup >> that generates sample data and either sends and receives it through a >> socket or memory map. There are other questions to analyze: >> >> - Schema negotiation >> - Metadata read / write (see >> https://github.com/apache/arrow/blob/master/format/Message.fbs) >> >> As discussed there are some inconsistencies between the reference >> implementations that we will need to resolve before this work can >> proceed to completion. The metadata (schemas and logical types, e.g. >> what is in Message.fbs) are also in flux and will require a round of >> iteration. >> >> Thanks, >> Wes >> >> On Thu, Jun 9, 2016 at 8:32 AM, Nicole Nemer <nicole.ne...@rms.com> wrote: >>> Hi Wes. >>> Would love to help. Just point me the tests that need to be >>> expanded/written and I will work on that today/tomorrow. >>> Thanks! >>> nn >>> — >>> Nicole Nemer, PhD >>> Technical Architect/Dev Manager >>> >>> 303-641-3340 >>> >>> >>> >>> >>> >>> >>> On 6/8/16, 5:47 PM, "Wes McKinney" <wesmck...@gmail.com> wrote: >>> >>>> hi Nicki >>>> >>>> Micah's patch for #1 is in progress here >>>> https://github.com/apache/arrow/pull/85 >>>> >>>> I believe Steven Phillips is working on a patch toward reconciling the >>>> Java implementation with the current working version of the spec. We >>>> need to be able to verify that memory can be passed between Java and >>>> C++ with full fidelity (using files / memory maps as the exchange >>>> medium to start); these integration tests will help enable other Arrow >>>> implementations validate their compatibility as well. It would be >>>> great to have some additional help here >>>> >>>> cheers >>>> Wes >>>> >>>> On Thu, Jun 2, 2016 at 7:10 AM, Nicole Nemer <nicole.ne...@rms.com> wrote: >>>>> Good Morning Micah, >>>>> How is 1 please? anything that I can do to help? >>>>> >>>>> Anyone with more insight on 2 please? >>>>> >>>>> Thanks, >>>>> nicki >>>>> ‹ >>>>> Nicole Nemer, PhD >>>>> Technical Architect/Dev Manager >>>>> >>>>> 303-641-3340 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 5/27/16, 9:51 AM, "Micah Kornfield" <emkornfi...@gmail.com> wrote: >>>>> >>>>>> Hi Nicki, >>>>>> 1. I'm currently working on the char/string support for C++. I've >>>>>> been a little bit backlogged on it. If I don't make substantial >>>>>> progress this weekend, I'm happy to relinquish the task. >>>>>> >>>>>> 2. I'll let someone more knowledgable about the java implementation >>>>>> chime in, but I think the answer is a qualified yes. We were just >>>>>> talking about trying to make the first integration test that proves >>>>>> C++/Java compatibility [1] >>>>>> >>>>>> 3. Yes it is easy to become a contributor. The general workflow is >>>>>> to chime in on jira item [2] (or someone on the PMC? can make you a >>>>>> contributor so you can assign a ticket to yourself), and submit a pull >>>>>> request via github with "ARROW-<JIRA-NUMBER>:" as the start of the >>>>>> pull request title. In addition to the items mentioned below there is >>>>>> a pretty substantial backlog of items to work on if you are interested >>>>>> in contributing generally. >>>>>> >>>>>> Thanks, >>>>>> Micah >>>>>> >>>>>> [1] >>>>>> http://mail-archives.apache.org/mod_mbox/arrow-dev/201605.mbox/%3CCAK7Z5 >>>>>> T8 >>>>>> X2OiWfSoQ0S-3vu0D4zgkuAO-SD_Q%3DF2Pu%3D4GhaTFbQ%40mail.gmail.com%3E >>>>>> [2] >>>>>> https://issues.apache.org/jira/browse/ARROW/?selectedTab=com.atlassian.j >>>>>> ir >>>>>> a.jira-projects-plugin:issues-panel >>>>>> >>>>>> >>>>>> >>>>>> On Fri, May 27, 2016 at 7:35 AM, Nicole Nemer <nicole.ne...@rms.com> >>>>>> wrote: >>>>>>> >>>>>>> 1. does cpp/ipc support char/string types? If not - when please? >>>>>>> 2. Is there a java implementation of the ipc feature please? If >>>>>>> not >>>>>>> - when please? >>>>>>> 3. Is it easy to join and help as a contributor? I would love to >>>>>>> help with these 2 items if they are planned for the near future. >>>>>>> >>>>>>> Thanks, >>>>>>> Nicki >>>>>>> - >>>>>>> Nicole Nemer, PhD >>>>>>> Technical Architect/Dev Manager >>>>>>> >>>>>>> >>>>>>> >>>>> >>> >