Attendance: - Daniel (Netflix), Jason (Drill, MapR), Sanjeev (Twitter), Sergio (Cloudera), Julien (Twitter), Ryan (Cloudera), Zhenxiao (Netflix), Nezih (Netflix)
Agenda: * 1.8.0 release blockers: https://issues.apache.org/jira/browse/PARQUET-292 - int64 patch for delta encoding => protected by the parquet 2.0 flag - predicate push down fix * How we can make releases easier and faster - Create a release JIRA + add blockers to it. Bugs should be always listed. being a release blocker is not required to be in the release. - Release instructions are in the repo. (How to release, including vote) - there's a "How to verify a release" wiki - We should stay on top of reviews to make sure blockers get resolved timely. Not just when we want to release but all the time. * PR/JIRA backlog: - some are features that have been merged another way and should be closed. - some against the filter API that need review from knowledgable people (Alex for example) - we should get together and do a JIRA/PR session. - bytebuffer/vectorization review - Ryan will propose a time/send an invite * Microsecond-precision time and timestamp specs (PARQUET-200) - microsecond is the standard and nanoseconds are usually not accurate - Drill (Jacques) should review - just needs a final approval * Row group and HDFS block alignment (HDFS-3689) - makes sure row groups are aligned with HDFS blocks - optional: padding in the meantime. the format would support adding gaps in between row groups. * Add some sanity checks to verify footer data is sensical. - Jason will open a JIRA * Version numbers in created_by - needed to add version based control. (i.e. ignore dic stats from before a given version that had a bug) - we could use parquet-generator to generate the Version class. (Ryan will look into this) * Vectorization status (Sergio and Dong would like to help) - Zhenxiao rebased the branch. - Dong Chen (Intel China, 13 hours later than PDT) comments have been addressed for the most part. Dong should review again. - page read can be optimized. Lazy load. - building presto implementation, Netflix will post results. - Encodings should be updated decode the entire page at once. - Daniel will list all the subtasks under PARQUET-131. * Bytebuffer merge (zero copy) - Drill will do a last rebase. - some low level API changes regarding accessing ByteBuffers instead of byte[]: Need to be reviewed by low level integration (Drill, Presto) - breaking changes in that area should be acceptable as they are internal. On Tue, Jun 2, 2015 at 10:03 AM, Julien Le Dem <[email protected]> wrote: > https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up?authuser=0 >
