Wes it right, I should have qualified my statement about the Drill code. As is stated in the Arrow repo initial design docs, the exact memory layout is not finalized. That being said, while the format is designed to be used in-memory, it doesn't have the same sticking point about backwards compatibility of a persistent format. Eventually it is possible that someone may use arrow structures for a long-lived in-memory cache, or persist arrow vectors to disk, but this would not be an optimal time to start such a project, as the format is not fully defined.
A more appropriate statement would have been, in Drill (and soon to be fully moved over to the arrow repo) there is an interface that you can use to access arrow-like structures, that will be evolving along with the arrow standard's ongoing development. If you are willing to work alongside the upcoming refactorings you could start integrating these interfaces into other projects. The in-memory structures they use do not yet represent a version of the arrow specification, as we have not yet finished discussing several parts of the specification, as summarized nicely by Wes, but they will be updated throughout the upcoming discussions. On Sat, Feb 27, 2016 at 11:11 AM, Wes McKinney <w...@cloudera.com> wrote: > Note that we have not prioritized building a lot of new software for > Arrow (outside of the basic C++ implementation and the Drill Java > extraction) because there are a number of details that we need to work > out as a group in the coming weeks: > > - Lingering physical memory layout questions, see working documents > https://github.com/apache/arrow/tree/master/format > - Metadata / schema details > - IPC / wire protocol > > As a project, these aspects of the Arrow specification are much more > important than any lines of code, because they define what it means to > "use Arrow". So getting started with Arrow is less about using a > particular piece of software but rather conforming data structures and > memory sharing to the Arrow specification. I will start a separate > thread shortly about the metadata unless someone beats me to it. > > Note: I will have some bandwidth the next month to work on the C++ > Arrow + Python Arrow + Parquet toolchain, so I plan to drop a series > of patches to enable Python pandas users to read Parquet files (using > https://github.com/apache/parquet-cpp) via Arrow data structures > (since pandas requires Arrow to be marshalled to NumPy arrays to be > used). > > - Wes > > On Sat, Feb 27, 2016 at 10:06 AM, Jason Altekruse > <altekruseja...@gmail.com> wrote: > > The java version of the Arrow project is reasonably consumable. The code > > was extracted from the Apache Drill project which has been using this > > columnar representation since its inception. > > > > Steven Phillips is working on finishing the extraction of the necessary > > interfaces from Drill over in his fork of the arrow repository [1], when > > this gets checked in Drill will be completely separated from Arrow and > just > > depending on it as any other consumer would. The branch is still work in > > progress but I believe he is getting close to posting a patch for review. > > If you want you could check out the code in the Drill repository right > now > > [2], seeing the vector classes requires running the build once because we > > use code generation to create vectors for each data type. After running > the > > Drill build the vector classes can be found at > > exec/vector/target/generated-sources. > > > > [1] - https://github.com/StevenMPhillips/arrow > > [2] - https://github.com/apache/drill > > > > On Fri, Feb 26, 2016 at 8:56 PM, Vishnu Viswanath < > > vishnu.viswanat...@gmail.com> wrote: > > > >> Thanks Leif, > >> I am not trying to incorporate Arrow to any production system. I am just > >> trying to learn this new DS. > >> If you have come across any blogs or if you can tell what should be the > >> starting steps in using Arrow, could you please let me know. > >> > >> -- > >> Thanks and Regards, > >> Vishnu Viswanath, > >> *www.vishnuviswanath.com <http://www.vishnuviswanath.com/>* > >> > >> On Fri, Feb 26, 2016 at 9:36 PM, Leif Walsh <leif.wa...@gmail.com> > wrote: > >> > >> > Arrow doesn't seem to be ready for use yet. I think it's an > aspirational > >> > project. I'd watch for announcements soon but I wouldn't try to > >> > incorporate today. > >> > > >> > On Fri, Feb 26, 2016 at 2:10 PM Slava B <gslav...@gmail.com> wrote: > >> > > >> > > Agree, also looking for such tutorial > >> > > > >> > > On Fri, Feb 26, 2016 at 11:05 AM, Vishnu Viswanath < > >> > > vishnu.viswanat...@gmail.com> wrote: > >> > > > >> > > > Hi All, > >> > > > > >> > > > I just joined this list, and would like to know if there is any > >> > > > documentation on how to get started with Apache Arrow. I am > >> interested > >> > in > >> > > > using arrow along with Spark or Flink. > >> > > > > >> > > > Thanks and Regards, > >> > > > Vishnu Viswanath, > >> > > > *www.vishnuviswanath.com <http://www.vishnuviswanath.com>* > >> > > > > >> > > > >> > -- > >> > -- > >> > Cheers, > >> > Leif > >> > > >> >