7/17/2019

Attendee:

Ryan Blue(Netflix)

Jame(Netflix)

Gidon Gershinsky(IBM)

Steven(Yelp)

Deepak and several other folks (Vertica)

Xinli Shang(Uber)

Junjie Chen

Topics:

   1.

   Column Encryption
   1.

      Gidon:
      1.

         C++ version code review: Have addressed all feedbacks. The last
         step is testing. Hopefully tomorrow the testing can be done.
         2.

         Reviewed bloom filter design from Parquet encryption perspective.
         It is straightforward.
         3.

         Not much done on Java version Parquet side. Worked with Xinli to
         fix several issues.
         4.

         Found throughput issues in Java and fixed it.
         2.

      Xinli:
      1.

         Gidon sent out a design which consolidates different ways of
         deploying parquet encryption, but not much attention is
gained from the
         community. Please have a look if you are interested in.
         2.

         There is a discussion about unifying table properties in
         HMS(HIVE-21848) for both ORC and Parquet column encryption.
Please chime in
         if you have a concern.
         3.

         Java version parquet-mr PR review is being slow. How do we move
         faster? We need more people to review it.
         1.

            https://github.com/apache/parquet-mr/pull/613
            2.

            https://github.com/apache/parquet-mr/pull/614
            3.

            https://github.com/apache/parquet-mr/pull/643
            3.

      Jim
      1.

         What is blocked on the parquet-mr review? We need more people to
         review it. There is a lot of PR now.
         4.

      Deepak
      1.

         Does the parquet encryption work with Hive?
         1.

            Yes, we have tested it(xinli).
            2.

         Also have questions about table properties definition.
         1.

            HIVE-21848(xinli)
            2. Bloom filter
      1.

      Junjie Chen
      1.

         We need one more PMC vote
         2.

      Ryan
      1.

         I will have a look next week. Were the issues raised earlier
         addressed?
         1.

            Yes(Junjie)
            2.

         Parquet-format should be considered as upstream for parquet-cpp
         and parquet-mr that are implementation.
         3.

         We need Encryption specification merge to parquet-format ASAP,
         then bloom filter. Otherwise, parquet-format will depend
parquet-cpp and
         parquet-mr, which is not right.

   https://github.com/apache/parquet-format/pull/68

   https://github.com/apache/parquet-format/pull/142
   1.

      Xinli
      1.

         Is parquet-format 2.6 + encryption compatible with parquet
         2.7(encryption + bloom filter)?
         1.

            By design, yes(Gidon)
            2.

         Please add Xinli for testing if we have a prototype for bloom
         filter to make sure they are compatible.



   1.

   Parquet-1.11.0 Release Validation
   1.

      Ryan
      1.

         Both Ryan and Zalton are very busy. No progress so far.
         2.

         We need to write a test to make sure the data write/read are
         correct.



   1.

   Remove old Parquet modules


   1.

   Ryan
   1.

      No time. If somebody has time to do it, go for it.


-- 
Xinli Shang

Reply via email to