Re: Parquet sync up happening now

Julien Le Dem Tue, 02 Jun 2015 11:34:23 -0700

Attendance:
 - Daniel (Netflix), Jason (Drill, MapR), Sanjeev (Twitter), Sergio
(Cloudera), Julien (Twitter), Ryan (Cloudera), Zhenxiao (Netflix), Nezih
(Netflix)


Agenda:
* 1.8.0 release blockers:
  https://issues.apache.org/jira/browse/PARQUET-292
  - int64 patch for delta encoding => protected by the parquet 2.0 flag
  - predicate push down fix

* How we can make releases easier and faster
  - Create a release JIRA + add blockers to it. Bugs should be always
listed. being a release blocker is not required to be in the release.
  - Release instructions are in the repo. (How to release, including vote)
  - there's a "How to verify a release" wiki
  - We should stay on top of reviews to make sure blockers get resolved
timely. Not just when we want to release but all the time.

* PR/JIRA backlog:
  - some are features that have been merged another way and should be
closed.
  - some against the filter API that need review from knowledgable people
(Alex for example)
  - we should get together and do a JIRA/PR session.
  - bytebuffer/vectorization review
  - Ryan will propose a time/send an invite

* Microsecond-precision time and timestamp specs (PARQUET-200)
  - microsecond is the standard and nanoseconds are usually not accurate
  - Drill (Jacques) should review
  - just needs a final approval

* Row group and HDFS block alignment (HDFS-3689)
  - makes sure row groups are aligned with HDFS blocks
  - optional: padding in the meantime. the format would support adding gaps
in between row groups.

* Add some sanity checks to verify footer data is sensical.
  - Jason will open a JIRA

* Version numbers in created_by
  - needed to add version based control. (i.e. ignore dic stats from before
a given version that had a bug)
  - we could use parquet-generator to generate the Version class. (Ryan
will look into this)

* Vectorization status (Sergio and Dong would like to help)
  - Zhenxiao rebased the branch.
  - Dong Chen (Intel China, 13 hours later than PDT) comments have been
addressed for the most part. Dong should review again.
  - page read can be optimized. Lazy load.
  - building presto implementation, Netflix will post results.
  - Encodings should be updated decode the entire page at once.
  - Daniel will list all the subtasks under PARQUET-131.

* Bytebuffer merge (zero copy)
  - Drill will do a last rebase.
  - some low level API changes regarding accessing ByteBuffers instead of
byte[]: Need to be reviewed by low level integration (Drill, Presto)
  - breaking changes in that area should be acceptable as they are
internal.



On Tue, Jun 2, 2015 at 10:03 AM, Julien Le Dem <[email protected]> wrote:

> https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up?authuser=0
>

Re: Parquet sync up happening now

Reply via email to