Re: Parquet Sync up happening now

Julien Le Dem Fri, 06 Nov 2015 10:01:31 -0800

 Sync up notes:

Next Hangout: Nov 23 10am Pacific Time
https://plus.google.com/u/0/events/ccj9di13bimjv6onkisemhiojjg


Attendees:
  - Daniel
  - Ryan
  - Jason
  - Julien

Agenda:
- ByteBuffer branch merged.
- Daniel: Dictionary based Filter API change. Depends on Hadoop API. Needs
discussion and review (in particular Alex)
- Ryan: Vectorized read APIs discussion
- Jason: Union Type in Drill => in Parquet
- parquet release.
- int64 encoding: feedback by Ryan

- Bytebuffer branch is now merged.
  - main features:
    - allocator can be injected to control where/how the memory is
allocated by Parquet (on-heap/off-heap)
    - avoid copies by using new byte buffer based apis for reading/writing
in Hadoop and CODEC
  - There are ares areas where we could integrate this more.

- Dictionary based filter:
  - some changes to access the dictionary directly in the file reader.
  - review page v2 to see if it’s good for filters.
  - maybe we need more metadata files optimized for this.
  - add more metadata in the footer? (ex: Bloom filters, ...).

- Vectorized Read:
  - We need a clean API for Vector level access of Parquet

- Union Type:
  - Avro impl of union: member0, member1, ...
  - Jason to send a proposal.
  - would be a good time to make a release.

- parquet-mr release:
  - Drill wants to depend on a official version

- slf4j in parquet? We should improve the logging config in Parquet.

- merge Parquet files: to be reviewed

- int64 delta encoding: to be reviewed







On Wed, Nov 4, 2015 at 10:04 AM, Julien Le Dem <[email protected]> wrote:

> https://plus.google.com/hangouts/_/event/cglct6qpocrf70n35mvtnnblvgc
>
> --
> Julien
>



-- 
Julien

Re: Parquet Sync up happening now

Reply via email to