Here is the PR with the code:
https://github.com/apache/parquet-mr/pull/356

On Fri, Jul 29, 2016 at 2:59 PM, Julien Le Dem <[email protected]> wrote:

> I will send a pull request soon with the code.
> Repetition levels are redundant as they encode information from the
> parents in the leaf nodes.
> I haven’t looked into that yet but we could make some versions of the code
> that ignore the parent nodes for the other leaves.
> Julien
>
> > On Jul 28, 2016, at 2:28 PM, Wes McKinney <[email protected]> wrote:
> >
> > Hi Julien,
> >
> > This is great to hear. Do you have code or an algorithm sketch for the
> > conversion? I would like to work on the C++ Parquet to Arrow
> > vectorized conversion in the next few months. One of the things I
> > haven't thought through is how to jointly decode leaf nodes that are
> > part of the same branch (e.g. foo.bar.baz and foo.bar.qux together)
> > without redundant computation (perhaps this is what you're alluding
> > too).
> >
> > Thanks,
> > Wes
> >
> > On Sat, Jul 16, 2016 at 9:49 PM, Julien Le Dem <[email protected]> wrote:
> >> On my end I did a few versions of vectorized conversion from parquet
> definition levels to arrow offsets.
> >> Some tricks to avoid branching work well.
> >> I'll publish something soon.
> >>
> >> Julien
> >>
> >>> On Jul 15, 2016, at 19:04, Jacques Nadeau <[email protected]> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I had a great time at the Hackathon. Thanks to Julien for putting this
> >>> together! Thanks to everyone who joined.
> >>>
> >>> There were some good discussions and some exploration work. I started
> >>> exploring a paradigm for supporting a zero performance impact
> abstraction
> >>> approach to on and off heap access currently named slyheap. I'm
> exploring
> >>> using sentinel objects and bytecode rewriting to avoid extra
> indirections
> >>> for primitive arrays when wanting to swap out to ArrowBuf. I'll be out
> of
> >>> of town over the next week but will try to post some progress on this
> the
> >>> following week.
> >>>
> >>> thanks,
> >>> Jacques
> >>>
> >>>
> >>>> On Thu, Jul 14, 2016 at 8:45 AM, Julien Le Dem <[email protected]>
> wrote:
> >>>>
> >>>> I'm currently in
> >>>> - the hangout:
> https://hangouts.google.com/hangouts/_/dremio.com/parquet
> >>>> - the irc channel parquet on irc.freenode.net
> >>>>
> >>>> On Tue, Jul 12, 2016 at 4:04 PM, Jacques Nadeau <[email protected]>
> >>>> wrote:
> >>>>
> >>>>> 883 N Shoreline Blvd, Suite C100, Mountain View, CA
> >>>>>
> >>>>> On Tue, Jul 12, 2016 at 3:16 PM, Parth Chandra <
> [email protected]>
> >>>>> wrote:
> >>>>>
> >>>>>> Can you post the address? I'll try to join the morning session.
> >>>>>>
> >>>>>> On Mon, Jul 11, 2016 at 9:36 PM, Julien Le Dem <[email protected]>
> >>>> wrote:
> >>>>>>
> >>>>>>> Confirming that we’ll do the Parquet Hackathon this Thursday July
> >>>> 14th
> >>>>>>> Pacific time (GMT-7 in summer)
> >>>>>>> There will be a Google hangout (I’ll send an invite and a link) and
> >>>> an
> >>>>>> IRC
> >>>>>>> channel (parquet channel on irc.freenode.net)
> >>>>>>> The location is the Dremio office on Shoreline Blvd, Mountain View,
> >>>> CA
> >>>>>>>
> >>>>>>> Responded:
> >>>>>>> - Jason
> >>>>>>> - Julien
> >>>>>>> - Nezih
> >>>>>>> - Deepak
> >>>>>>> - Ryan
> >>>>>>> - Jacques
> >>>>>>> - Urvish
> >>>>>>> Will join remotely:
> >>>>>>> - Uwe (GMT+1 in the morning)
> >>>>>>> - Ferd (GMT+8, in the afternoon 3:30pm -> 9pm)
> >>>>>>> - Wes
> >>>>>>>
> >>>>>>> I’ll probably be on irc/hangout while on the train 8:33am -> 9:46am
> >>>> and
> >>>>>> be
> >>>>>>> there around 10am
> >>>>>>> There will be people to open the door earlier.
> >>>>>>>
> >>>>>>> Agenda/things that have been mentioned on the thread:
> >>>>>>> - Parquet <-> Arrow
> >>>>>>> - Parquet-cpp->Arrow-C++->PyArrow
> >>>>>>> - https://issues.apache.org/jira/browse/HIVE-8128 <
> >>>>>>> https://issues.apache.org/jira/browse/HIVE-8128>
> >>>>>>> - vectorized read in Drill
> >>>>>>> -
> >>>>
> https://github.com/apache/spark/tree/master/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet
> >>>>>>> <
> >>>>
> https://github.com/apache/spark/tree/master/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet
> >>>>>>> - https://github.com/apache/parquet-mr/pull/257 <
> >>>>>>> https://github.com/apache/parquet-mr/pull/257>
> >>>>>>>
> >>>>>>> Feel free to add more/show up
> >>>>>>>
> >>>>>>>> On Jul 8, 2016, at 10:54 PM, Julien Le Dem <[email protected]>
> >>>>> wrote:
> >>>>>>>>
> >>>>>>>> There is the parquet channel on irc.freenode.net
> >>>>>>>> I'll set up a hangout as well.
> >>>>>>>>
> >>>>>>>>> On Fri, Jul 8, 2016 at 9:54 AM, Wes McKinney <
> [email protected]>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Do we yet have a Slack / IRC for Parquet? I will be joining
> >>>> remotely
> >>>>>>>>> throughout the day. Anyone who is interested in algorithms for
> >>>> Arrow
> >>>>>>>>> nested data <-> Parquet disassembly/reassembly, we should start a
> >>>>>>>>> shared Google document to detail algorithms and various test
> cases
> >>>>>>>>> we'll need to address in the implementation.
> >>>>>>>>>
> >>>>>>>>> On Wed, Jul 6, 2016 at 2:30 PM, Deepak Majeti <
> >>>>>> [email protected]>
> >>>>>>>>> wrote:
> >>>>>>>>>> 14rth works for me too. I am mainly interested in vectorizing
> >>>>>>>>>> parquet-cpp as well.
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Jul 6, 2016 at 4:50 PM, Nezih Yigitbasi
> >>>>>>>>>> <[email protected]> wrote:
> >>>>>>>>>>> 14th works for me too.
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, Jul 6, 2016 at 12:54 AM Uwe Korn <[email protected]>
> >>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Yes, I'm GMT +1
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On 05.07.16 18:52, Julien Le Dem wrote:
> >>>>>>>>>>>>> If there are people interested in the cpp implementation
> we’ll
> >>>>>> talk
> >>>>>>>>>>>> about that too.
> >>>>>>>>>>>>> I’m happy to give context or help with the encoding. In
> >>>>>> particular a
> >>>>>>>>>>>> Parquet -> Arrow vectorized converter would be great.
> >>>>>>>>>>>>> Are you GMT +1 ?
> >>>>>>>>>>>>> We can schedule a 1 hour slot in the morning for discussing
> >>>> with
> >>>>>>>>> remote
> >>>>>>>>>>>> folks in Europe. (same in afternoon if there are people
> joining
> >>>>>> from
> >>>>>>>>> Asia)
> >>>>>>>>>>>>> Julien
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Jul 5, 2016, at 2:37 AM, Uwe Korn <[email protected]>
> >>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hello,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> this effort is only for the parquet-mr project or would
> there
> >>>>>> also
> >>>>>>>>> be
> >>>>>>>>>>>> some work/benefit for parquet-cpp? If so, I might join briefly
> >>>>> in a
> >>>>>>>>> hangout
> >>>>>>>>>>>> but due to the timezone shift, I probably will not be able to
> >>>> be
> >>>>>>> awake
> >>>>>>>>> all
> >>>>>>>>>>>> the time.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Uwe
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On 02.07.16 01:01, Julien Le Dem wrote:
> >>>>>>>>>>>>>>> Dear Parquet dev list,
> >>>>>>>>>>>>>>> There have been efforts in several projects for vectorized
> >>>>> reads
> >>>>>>> of
> >>>>>>>>>>>> Parquet.
> >>>>>>>>>>>>>>> We had discussed during the Parquet sync up to organize a
> >>>>>>>>> hackathon to
> >>>>>>>>>>>>>>> brainstorm and look into a shared implementation.
> >>>>>>>>>>>>>>> Some projects that would benefit:
> >>>>>>>>>>>>>>> - Apache Drill
> >>>>>>>>>>>>>>> - Apache Arrow
> >>>>>>>>>>>>>>> - Apache Spark
> >>>>>>>>>>>>>>> - Presto
> >>>>>>>>>>>>>>> - Apache Hive
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I'm planning to organize this at the Dremio office in
> >>>> Mountain
> >>>>>>> View
> >>>>>>>>>>>> with
> >>>>>>>>>>>>>>> optionally a hangout for people who would want to join
> >>>>> remotely.
> >>>>>>>>>>>>>>> I'm adding to the "to:" people that have expressed interest
> >>>> or
> >>>>>>>>> could be
> >>>>>>>>>>>>>>> interested but that's not an exhaustive list. Please
> respond
> >>>>> to
> >>>>>>>>> this
> >>>>>>>>>>>> email
> >>>>>>>>>>>>>>> if you wish to be included.
> >>>>>>>>>>>>>>> Who's interested and what dates would work between this
> >>>>> Tuesday
> >>>>>>>>> 7/5 and
> >>>>>>>>>>>>>>> Wednesday 7/20 ?
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> regards,
> >>>>>>>>>> Deepak Majeti
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Julien
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Julien
> >>>>
> >>
>
>


-- 
Julien

Reply via email to