Sync up notes:
Next Hangout: Nov 23 10am Pacific Time
https://plus.google.com/u/0/events/ccj9di13bimjv6onkisemhiojjg
Attendees:
- Daniel
- Ryan
- Jason
- Julien
Agenda:
- ByteBuffer branch merged.
- Daniel: Dictionary based Filter API change. Depends on Hadoop API. Needs
discussion and review (in particular Alex)
- Ryan: Vectorized read APIs discussion
- Jason: Union Type in Drill => in Parquet
- parquet release.
- int64 encoding: feedback by Ryan
- Bytebuffer branch is now merged.
- main features:
- allocator can be injected to control where/how the memory is
allocated by Parquet (on-heap/off-heap)
- avoid copies by using new byte buffer based apis for reading/writing
in Hadoop and CODEC
- There are ares areas where we could integrate this more.
- Dictionary based filter:
- some changes to access the dictionary directly in the file reader.
- review page v2 to see if it’s good for filters.
- maybe we need more metadata files optimized for this.
- add more metadata in the footer? (ex: Bloom filters, ...).
- Vectorized Read:
- We need a clean API for Vector level access of Parquet
- Union Type:
- Avro impl of union: member0, member1, ...
- Jason to send a proposal.
- would be a good time to make a release.
- parquet-mr release:
- Drill wants to depend on a official version
- slf4j in parquet? We should improve the logging config in Parquet.
- merge Parquet files: to be reviewed
- int64 delta encoding: to be reviewed
On Wed, Nov 4, 2015 at 10:04 AM, Julien Le Dem <[email protected]> wrote:
> https://plus.google.com/hangouts/_/event/cglct6qpocrf70n35mvtnnblvgc
>
> --
> Julien
>
--
Julien