Hi Julien,

This is great to hear. Do you have code or an algorithm sketch for the
conversion? I would like to work on the C++ Parquet to Arrow
vectorized conversion in the next few months. One of the things I
haven't thought through is how to jointly decode leaf nodes that are
part of the same branch (e.g. foo.bar.baz and foo.bar.qux together)
without redundant computation (perhaps this is what you're alluding
too).

Thanks,
Wes

On Sat, Jul 16, 2016 at 9:49 PM, Julien Le Dem <[email protected]> wrote:
> On my end I did a few versions of vectorized conversion from parquet 
> definition levels to arrow offsets.
> Some tricks to avoid branching work well.
> I'll publish something soon.
>
> Julien
>
>> On Jul 15, 2016, at 19:04, Jacques Nadeau <[email protected]> wrote:
>>
>> Hello,
>>
>> I had a great time at the Hackathon. Thanks to Julien for putting this
>> together! Thanks to everyone who joined.
>>
>> There were some good discussions and some exploration work. I started
>> exploring a paradigm for supporting a zero performance impact abstraction
>> approach to on and off heap access currently named slyheap. I'm exploring
>> using sentinel objects and bytecode rewriting to avoid extra indirections
>> for primitive arrays when wanting to swap out to ArrowBuf. I'll be out of
>> of town over the next week but will try to post some progress on this the
>> following week.
>>
>> thanks,
>> Jacques
>>
>>
>>> On Thu, Jul 14, 2016 at 8:45 AM, Julien Le Dem <[email protected]> wrote:
>>>
>>> I'm currently in
>>> - the hangout: https://hangouts.google.com/hangouts/_/dremio.com/parquet
>>> - the irc channel parquet on irc.freenode.net
>>>
>>> On Tue, Jul 12, 2016 at 4:04 PM, Jacques Nadeau <[email protected]>
>>> wrote:
>>>
>>>> 883 N Shoreline Blvd, Suite C100, Mountain View, CA
>>>>
>>>> On Tue, Jul 12, 2016 at 3:16 PM, Parth Chandra <[email protected]>
>>>> wrote:
>>>>
>>>>> Can you post the address? I'll try to join the morning session.
>>>>>
>>>>> On Mon, Jul 11, 2016 at 9:36 PM, Julien Le Dem <[email protected]>
>>> wrote:
>>>>>
>>>>>> Confirming that we’ll do the Parquet Hackathon this Thursday July
>>> 14th
>>>>>> Pacific time (GMT-7 in summer)
>>>>>> There will be a Google hangout (I’ll send an invite and a link) and
>>> an
>>>>> IRC
>>>>>> channel (parquet channel on irc.freenode.net)
>>>>>> The location is the Dremio office on Shoreline Blvd, Mountain View,
>>> CA
>>>>>>
>>>>>> Responded:
>>>>>> - Jason
>>>>>> - Julien
>>>>>> - Nezih
>>>>>> - Deepak
>>>>>> - Ryan
>>>>>> - Jacques
>>>>>> - Urvish
>>>>>> Will join remotely:
>>>>>> - Uwe (GMT+1 in the morning)
>>>>>> - Ferd (GMT+8, in the afternoon 3:30pm -> 9pm)
>>>>>> - Wes
>>>>>>
>>>>>> I’ll probably be on irc/hangout while on the train 8:33am -> 9:46am
>>> and
>>>>> be
>>>>>> there around 10am
>>>>>> There will be people to open the door earlier.
>>>>>>
>>>>>> Agenda/things that have been mentioned on the thread:
>>>>>> - Parquet <-> Arrow
>>>>>> - Parquet-cpp->Arrow-C++->PyArrow
>>>>>> - https://issues.apache.org/jira/browse/HIVE-8128 <
>>>>>> https://issues.apache.org/jira/browse/HIVE-8128>
>>>>>> - vectorized read in Drill
>>>>>> -
>>> https://github.com/apache/spark/tree/master/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet
>>>>>> <
>>> https://github.com/apache/spark/tree/master/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet
>>>>>> - https://github.com/apache/parquet-mr/pull/257 <
>>>>>> https://github.com/apache/parquet-mr/pull/257>
>>>>>>
>>>>>> Feel free to add more/show up
>>>>>>
>>>>>>> On Jul 8, 2016, at 10:54 PM, Julien Le Dem <[email protected]>
>>>> wrote:
>>>>>>>
>>>>>>> There is the parquet channel on irc.freenode.net
>>>>>>> I'll set up a hangout as well.
>>>>>>>
>>>>>>>> On Fri, Jul 8, 2016 at 9:54 AM, Wes McKinney <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Do we yet have a Slack / IRC for Parquet? I will be joining
>>> remotely
>>>>>>>> throughout the day. Anyone who is interested in algorithms for
>>> Arrow
>>>>>>>> nested data <-> Parquet disassembly/reassembly, we should start a
>>>>>>>> shared Google document to detail algorithms and various test cases
>>>>>>>> we'll need to address in the implementation.
>>>>>>>>
>>>>>>>> On Wed, Jul 6, 2016 at 2:30 PM, Deepak Majeti <
>>>>> [email protected]>
>>>>>>>> wrote:
>>>>>>>>> 14rth works for me too. I am mainly interested in vectorizing
>>>>>>>>> parquet-cpp as well.
>>>>>>>>>
>>>>>>>>> On Wed, Jul 6, 2016 at 4:50 PM, Nezih Yigitbasi
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>> 14th works for me too.
>>>>>>>>>>
>>>>>>>>>> On Wed, Jul 6, 2016 at 12:54 AM Uwe Korn <[email protected]>
>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Yes, I'm GMT +1
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> On 05.07.16 18:52, Julien Le Dem wrote:
>>>>>>>>>>>> If there are people interested in the cpp implementation we’ll
>>>>> talk
>>>>>>>>>>> about that too.
>>>>>>>>>>>> I’m happy to give context or help with the encoding. In
>>>>> particular a
>>>>>>>>>>> Parquet -> Arrow vectorized converter would be great.
>>>>>>>>>>>> Are you GMT +1 ?
>>>>>>>>>>>> We can schedule a 1 hour slot in the morning for discussing
>>> with
>>>>>>>> remote
>>>>>>>>>>> folks in Europe. (same in afternoon if there are people joining
>>>>> from
>>>>>>>> Asia)
>>>>>>>>>>>> Julien
>>>>>>>>>>>>
>>>>>>>>>>>>> On Jul 5, 2016, at 2:37 AM, Uwe Korn <[email protected]>
>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>
>>>>>>>>>>>>> this effort is only for the parquet-mr project or would there
>>>>> also
>>>>>>>> be
>>>>>>>>>>> some work/benefit for parquet-cpp? If so, I might join briefly
>>>> in a
>>>>>>>> hangout
>>>>>>>>>>> but due to the timezone shift, I probably will not be able to
>>> be
>>>>>> awake
>>>>>>>> all
>>>>>>>>>>> the time.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Uwe
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 02.07.16 01:01, Julien Le Dem wrote:
>>>>>>>>>>>>>> Dear Parquet dev list,
>>>>>>>>>>>>>> There have been efforts in several projects for vectorized
>>>> reads
>>>>>> of
>>>>>>>>>>> Parquet.
>>>>>>>>>>>>>> We had discussed during the Parquet sync up to organize a
>>>>>>>> hackathon to
>>>>>>>>>>>>>> brainstorm and look into a shared implementation.
>>>>>>>>>>>>>> Some projects that would benefit:
>>>>>>>>>>>>>> - Apache Drill
>>>>>>>>>>>>>> - Apache Arrow
>>>>>>>>>>>>>> - Apache Spark
>>>>>>>>>>>>>> - Presto
>>>>>>>>>>>>>> - Apache Hive
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm planning to organize this at the Dremio office in
>>> Mountain
>>>>>> View
>>>>>>>>>>> with
>>>>>>>>>>>>>> optionally a hangout for people who would want to join
>>>> remotely.
>>>>>>>>>>>>>> I'm adding to the "to:" people that have expressed interest
>>> or
>>>>>>>> could be
>>>>>>>>>>>>>> interested but that's not an exhaustive list. Please respond
>>>> to
>>>>>>>> this
>>>>>>>>>>> email
>>>>>>>>>>>>>> if you wish to be included.
>>>>>>>>>>>>>> Who's interested and what dates would work between this
>>>> Tuesday
>>>>>>>> 7/5 and
>>>>>>>>>>>>>> Wednesday 7/20 ?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> regards,
>>>>>>>>> Deepak Majeti
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Julien
>>>
>>>
>>>
>>> --
>>> Julien
>>>
>

Reply via email to