Notes: Attendees and topics of interest: - Julien: Twitter. 1.7.0 release. merging Bytebuffer access branch - Alex: Twitter. - Daniel: Netflix. GSOC(Bytebuffer access), 1.7.0 release, Vectorized read path. Schema evolution - Mickael: Criteo. - Ryan: Cloudera. 1.7.0 release, Semver versioning, Reviewing Pull Requests, Schema evolution - Sanjeev: Twitter. Support for interactive query. - Sergio: Cloudera. hearing about vectorization - Tianshuo: Twitter
Agenda: - Schema evolution - 1.7.0 release - merging ByteBuffer - Vectorized exec update - SemVer - PR review - interactive queries Discuss: - Schema evolution strategies: - index based access of file. Adding columns only at the end. can not delete field. can rename. - name based access: can not rename. can delete and add columns anywhere. - have another identifier for the column to enable best of both worlds. - 1.7 release: go through and make sure everything is renamed to org.apache. Ryan to do soon. - merging ByteBuffer: merge the org.parquet rename in the branch. then merge the branch right after 1.7 release - Vectorized exec update: Netflix picking it up. Waiting on the rename and the ByteBuffer read path. on the PARQUET-131 JIRA there's a link to the github repo. Update by Dong Chen. Goal to integrate with Presto. The Drill team and Chang Lian from Spark team should review as well. - SemVer: have a version number for the format. and a version number for the library. the library version increases whenever a breaking change in the API or the format. starting in 2.0 the writer must be provided with the format version. - Review Pull Requests: - Spark has a tool: https://spark-prs.appspot.com/open-prs#all - go look at the PR and ping relevant people. - interactive queries: - documenting capabilities of SQL engines and level of integrated-ness with Parquet: - Presto - Drill - Impala: has done a lot of work to do code-gen to be fast. Lacks nested types (80% done, in impala 2.4). Impala uses index access to columns (restricts schema evolution). New encodings will come after. - Spark SQL On Tue, May 12, 2015 at 10:07 AM, Julien Le Dem <[email protected]> wrote: > Happening now: > > https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up?authuser=0&hceid=anVsaWVuQHR3aXR0ZXIuY29t.8ojja1ffv4jnptqalci3qebf8o >
