On my end I did a few versions of vectorized conversion from parquet definition 
levels to arrow offsets. 
Some tricks to avoid branching work well. 
I'll publish something soon. 

Julien

> On Jul 15, 2016, at 19:04, Jacques Nadeau <[email protected]> wrote:
> 
> Hello,
> 
> I had a great time at the Hackathon. Thanks to Julien for putting this
> together! Thanks to everyone who joined.
> 
> There were some good discussions and some exploration work. I started
> exploring a paradigm for supporting a zero performance impact abstraction
> approach to on and off heap access currently named slyheap. I'm exploring
> using sentinel objects and bytecode rewriting to avoid extra indirections
> for primitive arrays when wanting to swap out to ArrowBuf. I'll be out of
> of town over the next week but will try to post some progress on this the
> following week.
> 
> thanks,
> Jacques
> 
> 
>> On Thu, Jul 14, 2016 at 8:45 AM, Julien Le Dem <[email protected]> wrote:
>> 
>> I'm currently in
>> - the hangout: https://hangouts.google.com/hangouts/_/dremio.com/parquet
>> - the irc channel parquet on irc.freenode.net
>> 
>> On Tue, Jul 12, 2016 at 4:04 PM, Jacques Nadeau <[email protected]>
>> wrote:
>> 
>>> 883 N Shoreline Blvd, Suite C100, Mountain View, CA
>>> 
>>> On Tue, Jul 12, 2016 at 3:16 PM, Parth Chandra <[email protected]>
>>> wrote:
>>> 
>>>> Can you post the address? I'll try to join the morning session.
>>>> 
>>>> On Mon, Jul 11, 2016 at 9:36 PM, Julien Le Dem <[email protected]>
>> wrote:
>>>> 
>>>>> Confirming that we’ll do the Parquet Hackathon this Thursday July
>> 14th
>>>>> Pacific time (GMT-7 in summer)
>>>>> There will be a Google hangout (I’ll send an invite and a link) and
>> an
>>>> IRC
>>>>> channel (parquet channel on irc.freenode.net)
>>>>> The location is the Dremio office on Shoreline Blvd, Mountain View,
>> CA
>>>>> 
>>>>> Responded:
>>>>> - Jason
>>>>> - Julien
>>>>> - Nezih
>>>>> - Deepak
>>>>> - Ryan
>>>>> - Jacques
>>>>> - Urvish
>>>>> Will join remotely:
>>>>> - Uwe (GMT+1 in the morning)
>>>>> - Ferd (GMT+8, in the afternoon 3:30pm -> 9pm)
>>>>> - Wes
>>>>> 
>>>>> I’ll probably be on irc/hangout while on the train 8:33am -> 9:46am
>> and
>>>> be
>>>>> there around 10am
>>>>> There will be people to open the door earlier.
>>>>> 
>>>>> Agenda/things that have been mentioned on the thread:
>>>>> - Parquet <-> Arrow
>>>>> - Parquet-cpp->Arrow-C++->PyArrow
>>>>> - https://issues.apache.org/jira/browse/HIVE-8128 <
>>>>> https://issues.apache.org/jira/browse/HIVE-8128>
>>>>> - vectorized read in Drill
>>>>> -
>> https://github.com/apache/spark/tree/master/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet
>>>>> <
>> https://github.com/apache/spark/tree/master/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet
>>>>> - https://github.com/apache/parquet-mr/pull/257 <
>>>>> https://github.com/apache/parquet-mr/pull/257>
>>>>> 
>>>>> Feel free to add more/show up
>>>>> 
>>>>>> On Jul 8, 2016, at 10:54 PM, Julien Le Dem <[email protected]>
>>> wrote:
>>>>>> 
>>>>>> There is the parquet channel on irc.freenode.net
>>>>>> I'll set up a hangout as well.
>>>>>> 
>>>>>>> On Fri, Jul 8, 2016 at 9:54 AM, Wes McKinney <[email protected]>
>>>>>> wrote:
>>>>>> 
>>>>>>> Do we yet have a Slack / IRC for Parquet? I will be joining
>> remotely
>>>>>>> throughout the day. Anyone who is interested in algorithms for
>> Arrow
>>>>>>> nested data <-> Parquet disassembly/reassembly, we should start a
>>>>>>> shared Google document to detail algorithms and various test cases
>>>>>>> we'll need to address in the implementation.
>>>>>>> 
>>>>>>> On Wed, Jul 6, 2016 at 2:30 PM, Deepak Majeti <
>>>> [email protected]>
>>>>>>> wrote:
>>>>>>>> 14rth works for me too. I am mainly interested in vectorizing
>>>>>>>> parquet-cpp as well.
>>>>>>>> 
>>>>>>>> On Wed, Jul 6, 2016 at 4:50 PM, Nezih Yigitbasi
>>>>>>>> <[email protected]> wrote:
>>>>>>>>> 14th works for me too.
>>>>>>>>> 
>>>>>>>>> On Wed, Jul 6, 2016 at 12:54 AM Uwe Korn <[email protected]>
>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Yes, I'm GMT +1
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On 05.07.16 18:52, Julien Le Dem wrote:
>>>>>>>>>>> If there are people interested in the cpp implementation we’ll
>>>> talk
>>>>>>>>>> about that too.
>>>>>>>>>>> I’m happy to give context or help with the encoding. In
>>>> particular a
>>>>>>>>>> Parquet -> Arrow vectorized converter would be great.
>>>>>>>>>>> Are you GMT +1 ?
>>>>>>>>>>> We can schedule a 1 hour slot in the morning for discussing
>> with
>>>>>>> remote
>>>>>>>>>> folks in Europe. (same in afternoon if there are people joining
>>>> from
>>>>>>> Asia)
>>>>>>>>>>> Julien
>>>>>>>>>>> 
>>>>>>>>>>>> On Jul 5, 2016, at 2:37 AM, Uwe Korn <[email protected]>
>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hello,
>>>>>>>>>>>> 
>>>>>>>>>>>> this effort is only for the parquet-mr project or would there
>>>> also
>>>>>>> be
>>>>>>>>>> some work/benefit for parquet-cpp? If so, I might join briefly
>>> in a
>>>>>>> hangout
>>>>>>>>>> but due to the timezone shift, I probably will not be able to
>> be
>>>>> awake
>>>>>>> all
>>>>>>>>>> the time.
>>>>>>>>>>>> 
>>>>>>>>>>>> Uwe
>>>>>>>>>>>> 
>>>>>>>>>>>>> On 02.07.16 01:01, Julien Le Dem wrote:
>>>>>>>>>>>>> Dear Parquet dev list,
>>>>>>>>>>>>> There have been efforts in several projects for vectorized
>>> reads
>>>>> of
>>>>>>>>>> Parquet.
>>>>>>>>>>>>> We had discussed during the Parquet sync up to organize a
>>>>>>> hackathon to
>>>>>>>>>>>>> brainstorm and look into a shared implementation.
>>>>>>>>>>>>> Some projects that would benefit:
>>>>>>>>>>>>> - Apache Drill
>>>>>>>>>>>>> - Apache Arrow
>>>>>>>>>>>>> - Apache Spark
>>>>>>>>>>>>> - Presto
>>>>>>>>>>>>> - Apache Hive
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I'm planning to organize this at the Dremio office in
>> Mountain
>>>>> View
>>>>>>>>>> with
>>>>>>>>>>>>> optionally a hangout for people who would want to join
>>> remotely.
>>>>>>>>>>>>> I'm adding to the "to:" people that have expressed interest
>> or
>>>>>>> could be
>>>>>>>>>>>>> interested but that's not an exhaustive list. Please respond
>>> to
>>>>>>> this
>>>>>>>>>> email
>>>>>>>>>>>>> if you wish to be included.
>>>>>>>>>>>>> Who's interested and what dates would work between this
>>> Tuesday
>>>>>>> 7/5 and
>>>>>>>>>>>>> Wednesday 7/20 ?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> regards,
>>>>>>>> Deepak Majeti
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Julien
>> 
>> 
>> 
>> --
>> Julien
>> 

Reply via email to