On my end I did a few versions of vectorized conversion from parquet definition levels to arrow offsets. Some tricks to avoid branching work well. I'll publish something soon.
Julien > On Jul 15, 2016, at 19:04, Jacques Nadeau <[email protected]> wrote: > > Hello, > > I had a great time at the Hackathon. Thanks to Julien for putting this > together! Thanks to everyone who joined. > > There were some good discussions and some exploration work. I started > exploring a paradigm for supporting a zero performance impact abstraction > approach to on and off heap access currently named slyheap. I'm exploring > using sentinel objects and bytecode rewriting to avoid extra indirections > for primitive arrays when wanting to swap out to ArrowBuf. I'll be out of > of town over the next week but will try to post some progress on this the > following week. > > thanks, > Jacques > > >> On Thu, Jul 14, 2016 at 8:45 AM, Julien Le Dem <[email protected]> wrote: >> >> I'm currently in >> - the hangout: https://hangouts.google.com/hangouts/_/dremio.com/parquet >> - the irc channel parquet on irc.freenode.net >> >> On Tue, Jul 12, 2016 at 4:04 PM, Jacques Nadeau <[email protected]> >> wrote: >> >>> 883 N Shoreline Blvd, Suite C100, Mountain View, CA >>> >>> On Tue, Jul 12, 2016 at 3:16 PM, Parth Chandra <[email protected]> >>> wrote: >>> >>>> Can you post the address? I'll try to join the morning session. >>>> >>>> On Mon, Jul 11, 2016 at 9:36 PM, Julien Le Dem <[email protected]> >> wrote: >>>> >>>>> Confirming that we’ll do the Parquet Hackathon this Thursday July >> 14th >>>>> Pacific time (GMT-7 in summer) >>>>> There will be a Google hangout (I’ll send an invite and a link) and >> an >>>> IRC >>>>> channel (parquet channel on irc.freenode.net) >>>>> The location is the Dremio office on Shoreline Blvd, Mountain View, >> CA >>>>> >>>>> Responded: >>>>> - Jason >>>>> - Julien >>>>> - Nezih >>>>> - Deepak >>>>> - Ryan >>>>> - Jacques >>>>> - Urvish >>>>> Will join remotely: >>>>> - Uwe (GMT+1 in the morning) >>>>> - Ferd (GMT+8, in the afternoon 3:30pm -> 9pm) >>>>> - Wes >>>>> >>>>> I’ll probably be on irc/hangout while on the train 8:33am -> 9:46am >> and >>>> be >>>>> there around 10am >>>>> There will be people to open the door earlier. >>>>> >>>>> Agenda/things that have been mentioned on the thread: >>>>> - Parquet <-> Arrow >>>>> - Parquet-cpp->Arrow-C++->PyArrow >>>>> - https://issues.apache.org/jira/browse/HIVE-8128 < >>>>> https://issues.apache.org/jira/browse/HIVE-8128> >>>>> - vectorized read in Drill >>>>> - >> https://github.com/apache/spark/tree/master/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet >>>>> < >> https://github.com/apache/spark/tree/master/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet >>>>> - https://github.com/apache/parquet-mr/pull/257 < >>>>> https://github.com/apache/parquet-mr/pull/257> >>>>> >>>>> Feel free to add more/show up >>>>> >>>>>> On Jul 8, 2016, at 10:54 PM, Julien Le Dem <[email protected]> >>> wrote: >>>>>> >>>>>> There is the parquet channel on irc.freenode.net >>>>>> I'll set up a hangout as well. >>>>>> >>>>>>> On Fri, Jul 8, 2016 at 9:54 AM, Wes McKinney <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Do we yet have a Slack / IRC for Parquet? I will be joining >> remotely >>>>>>> throughout the day. Anyone who is interested in algorithms for >> Arrow >>>>>>> nested data <-> Parquet disassembly/reassembly, we should start a >>>>>>> shared Google document to detail algorithms and various test cases >>>>>>> we'll need to address in the implementation. >>>>>>> >>>>>>> On Wed, Jul 6, 2016 at 2:30 PM, Deepak Majeti < >>>> [email protected]> >>>>>>> wrote: >>>>>>>> 14rth works for me too. I am mainly interested in vectorizing >>>>>>>> parquet-cpp as well. >>>>>>>> >>>>>>>> On Wed, Jul 6, 2016 at 4:50 PM, Nezih Yigitbasi >>>>>>>> <[email protected]> wrote: >>>>>>>>> 14th works for me too. >>>>>>>>> >>>>>>>>> On Wed, Jul 6, 2016 at 12:54 AM Uwe Korn <[email protected]> >>> wrote: >>>>>>>>> >>>>>>>>>> Yes, I'm GMT +1 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On 05.07.16 18:52, Julien Le Dem wrote: >>>>>>>>>>> If there are people interested in the cpp implementation we’ll >>>> talk >>>>>>>>>> about that too. >>>>>>>>>>> I’m happy to give context or help with the encoding. In >>>> particular a >>>>>>>>>> Parquet -> Arrow vectorized converter would be great. >>>>>>>>>>> Are you GMT +1 ? >>>>>>>>>>> We can schedule a 1 hour slot in the morning for discussing >> with >>>>>>> remote >>>>>>>>>> folks in Europe. (same in afternoon if there are people joining >>>> from >>>>>>> Asia) >>>>>>>>>>> Julien >>>>>>>>>>> >>>>>>>>>>>> On Jul 5, 2016, at 2:37 AM, Uwe Korn <[email protected]> >> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hello, >>>>>>>>>>>> >>>>>>>>>>>> this effort is only for the parquet-mr project or would there >>>> also >>>>>>> be >>>>>>>>>> some work/benefit for parquet-cpp? If so, I might join briefly >>> in a >>>>>>> hangout >>>>>>>>>> but due to the timezone shift, I probably will not be able to >> be >>>>> awake >>>>>>> all >>>>>>>>>> the time. >>>>>>>>>>>> >>>>>>>>>>>> Uwe >>>>>>>>>>>> >>>>>>>>>>>>> On 02.07.16 01:01, Julien Le Dem wrote: >>>>>>>>>>>>> Dear Parquet dev list, >>>>>>>>>>>>> There have been efforts in several projects for vectorized >>> reads >>>>> of >>>>>>>>>> Parquet. >>>>>>>>>>>>> We had discussed during the Parquet sync up to organize a >>>>>>> hackathon to >>>>>>>>>>>>> brainstorm and look into a shared implementation. >>>>>>>>>>>>> Some projects that would benefit: >>>>>>>>>>>>> - Apache Drill >>>>>>>>>>>>> - Apache Arrow >>>>>>>>>>>>> - Apache Spark >>>>>>>>>>>>> - Presto >>>>>>>>>>>>> - Apache Hive >>>>>>>>>>>>> >>>>>>>>>>>>> I'm planning to organize this at the Dremio office in >> Mountain >>>>> View >>>>>>>>>> with >>>>>>>>>>>>> optionally a hangout for people who would want to join >>> remotely. >>>>>>>>>>>>> I'm adding to the "to:" people that have expressed interest >> or >>>>>>> could be >>>>>>>>>>>>> interested but that's not an exhaustive list. Please respond >>> to >>>>>>> this >>>>>>>>>> email >>>>>>>>>>>>> if you wish to be included. >>>>>>>>>>>>> Who's interested and what dates would work between this >>> Tuesday >>>>>>> 7/5 and >>>>>>>>>>>>> Wednesday 7/20 ? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> regards, >>>>>>>> Deepak Majeti >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Julien >> >> >> >> -- >> Julien >>
