Thank you everyone. On Sat, Feb 27, 2016 at 2:00 PM, Jason Altekruse <altekruseja...@gmail.com> wrote:
> Wes it right, I should have qualified my statement about the Drill code. As > is stated in the Arrow repo initial design docs, the exact memory layout is > not finalized. That being said, while the format is designed to be used > in-memory, it doesn't have the same sticking point about backwards > compatibility of a persistent format. Eventually it is possible that > someone may use arrow structures for a long-lived in-memory cache, or > persist arrow vectors to disk, but this would not be an optimal time to > start such a project, as the format is not fully defined. > > A more appropriate statement would have been, in Drill (and soon to be > fully moved over to the arrow repo) there is an interface that you can use > to access arrow-like structures, that will be evolving along with the arrow > standard's ongoing development. If you are willing to work alongside the > upcoming refactorings you could start integrating these interfaces into > other projects. The in-memory structures they use do not yet represent a > version of the arrow specification, as we have not yet finished discussing > several parts of the specification, as summarized nicely by Wes, but they > will be updated throughout the upcoming discussions. > > On Sat, Feb 27, 2016 at 11:11 AM, Wes McKinney <w...@cloudera.com> wrote: > > > Note that we have not prioritized building a lot of new software for > > Arrow (outside of the basic C++ implementation and the Drill Java > > extraction) because there are a number of details that we need to work > > out as a group in the coming weeks: > > > > - Lingering physical memory layout questions, see working documents > > https://github.com/apache/arrow/tree/master/format > > - Metadata / schema details > > - IPC / wire protocol > > > > As a project, these aspects of the Arrow specification are much more > > important than any lines of code, because they define what it means to > > "use Arrow". So getting started with Arrow is less about using a > > particular piece of software but rather conforming data structures and > > memory sharing to the Arrow specification. I will start a separate > > thread shortly about the metadata unless someone beats me to it. > > > > Note: I will have some bandwidth the next month to work on the C++ > > Arrow + Python Arrow + Parquet toolchain, so I plan to drop a series > > of patches to enable Python pandas users to read Parquet files (using > > https://github.com/apache/parquet-cpp) via Arrow data structures > > (since pandas requires Arrow to be marshalled to NumPy arrays to be > > used). > > > > - Wes > > > > On Sat, Feb 27, 2016 at 10:06 AM, Jason Altekruse > > <altekruseja...@gmail.com> wrote: > > > The java version of the Arrow project is reasonably consumable. The > code > > > was extracted from the Apache Drill project which has been using this > > > columnar representation since its inception. > > > > > > Steven Phillips is working on finishing the extraction of the necessary > > > interfaces from Drill over in his fork of the arrow repository [1], > when > > > this gets checked in Drill will be completely separated from Arrow and > > just > > > depending on it as any other consumer would. The branch is still work > in > > > progress but I believe he is getting close to posting a patch for > review. > > > If you want you could check out the code in the Drill repository right > > now > > > [2], seeing the vector classes requires running the build once because > we > > > use code generation to create vectors for each data type. After running > > the > > > Drill build the vector classes can be found at > > > exec/vector/target/generated-sources. > > > > > > [1] - https://github.com/StevenMPhillips/arrow > > > [2] - https://github.com/apache/drill > > > > > > On Fri, Feb 26, 2016 at 8:56 PM, Vishnu Viswanath < > > > vishnu.viswanat...@gmail.com> wrote: > > > > > >> Thanks Leif, > > >> I am not trying to incorporate Arrow to any production system. I am > just > > >> trying to learn this new DS. > > >> If you have come across any blogs or if you can tell what should be > the > > >> starting steps in using Arrow, could you please let me know. > > >> > > >> -- > > >> Thanks and Regards, > > >> Vishnu Viswanath, > > >> *www.vishnuviswanath.com <http://www.vishnuviswanath.com/>* > > >> > > >> On Fri, Feb 26, 2016 at 9:36 PM, Leif Walsh <leif.wa...@gmail.com> > > wrote: > > >> > > >> > Arrow doesn't seem to be ready for use yet. I think it's an > > aspirational > > >> > project. I'd watch for announcements soon but I wouldn't try to > > >> > incorporate today. > > >> > > > >> > On Fri, Feb 26, 2016 at 2:10 PM Slava B <gslav...@gmail.com> wrote: > > >> > > > >> > > Agree, also looking for such tutorial > > >> > > > > >> > > On Fri, Feb 26, 2016 at 11:05 AM, Vishnu Viswanath < > > >> > > vishnu.viswanat...@gmail.com> wrote: > > >> > > > > >> > > > Hi All, > > >> > > > > > >> > > > I just joined this list, and would like to know if there is any > > >> > > > documentation on how to get started with Apache Arrow. I am > > >> interested > > >> > in > > >> > > > using arrow along with Spark or Flink. > > >> > > > > > >> > > > Thanks and Regards, > > >> > > > Vishnu Viswanath, > > >> > > > *www.vishnuviswanath.com <http://www.vishnuviswanath.com>* > > >> > > > > > >> > > > > >> > -- > > >> > -- > > >> > Cheers, > > >> > Leif > > >> > > > >> > > > -- Thanks and Regards, Vishnu Viswanath, *www.vishnuviswanath.com <http://www.vishnuviswanath.com>*