Attendance/Agenda:
Deepak (Vertica):
- indexing discussion
Wes (twosigma):
- indexing discussion
- parquet-cpp 1.1
Marcel (Cloudera Impala):
- Index proposal
- sort order clarification went in
Julien (Dremio):
- indexing
- protos
Lukas (parquet-proto):
- parquet-proto
Notes:
- parquet-proto:
- 3 changes on the way:
- issue with protos repeated field that often are not read by other
integrations
- add support for protos generic types (may break compatibility?)
- schema evolution using ids in photo fields.
- Lukas to send JIRAs
- would want to merge them soon and have a release
- Index proposal for improving point queries and range queries.
https://docs.google.com/document/d/1sBACp8Lbutuj1Zxdowvsrlm8ku4BFxf8U_Do5K2wSO4/edit#
- todo (Marcel): clarify mechanism to store OffsetIndex and ColumnIndex
outside the footer (probably just before).
- todo (Marcel): add other optional fields form statistics in
ColumnIndex (min, max, null_count, distinct_count)
- todo (everyone): iterate on the feedback
- impala prototype planned for June
- Logical types pull request:
https://github.com/apache/parquet-format/pull/51/files
- todo: give more feedback
--
Julien