This will be based on what participants want to work on :) We’d want to share ideas and existing implementations.
Nezih’s patch is one of them done in the context of Presto. There are some vectorized readers in SparkSQL as well For example this one: https://github.com/apache/spark/blob/d6dc12ef0146ae409834c78737c116050961f350/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java <https://github.com/apache/spark/blob/d6dc12ef0146ae409834c78737c116050961f350/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java> And others in the same package: https://github.com/apache/spark/tree/158af162eac7348464c6751c8acd48fc6c117688/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet <https://github.com/apache/spark/tree/158af162eac7348464c6751c8acd48fc6c117688/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet> Hong started this effort. What time zone are you in? > On Jul 4, 2016, at 7:57 AM, Xu, Cheng A <[email protected]> wrote: > > Hi Julien, > I am working on Hive Parquet Vectorization implementations. And I am based on > the existing PR from Nezih https://github.com/apache/parquet-mr/pull/257. > Will this hackathon be based on that implementation as well or plan a new > shared one? > > Thanks > Ferd > > -----Original Message----- > From: Julien Le Dem [mailto:[email protected]] > Sent: Saturday, July 2, 2016 7:01 AM > To: [email protected]; nezih yigitbasi <[email protected]>; > Daniel Weeks <[email protected]>; Ryan Blue <[email protected]>; Steven > Phillips <[email protected]>; Nong Li <[email protected]>; Alex Levenson > <[email protected]>; [email protected]; Wes McKinney > <[email protected]>; [email protected]; [email protected]; Jacques > Nadeau <[email protected]> > Subject: Parquet Vectorized Read hackathon > > Dear Parquet dev list, > There have been efforts in several projects for vectorized reads of Parquet. > We had discussed during the Parquet sync up to organize a hackathon to > brainstorm and look into a shared implementation. > Some projects that would benefit: > - Apache Drill > - Apache Arrow > - Apache Spark > - Presto > - Apache Hive > > I'm planning to organize this at the Dremio office in Mountain View with > optionally a hangout for people who would want to join remotely. > I'm adding to the "to:" people that have expressed interest or could be > interested but that's not an exhaustive list. Please respond to this email if > you wish to be included. > Who's interested and what dates would work between this Tuesday 7/5 and > Wednesday 7/20 ? > > -- > Julien
