4/30/2019

Attendee:

Zoltan and Several other folks(Cloudera)

Brian (SaS?)

Ryan Blue(Netflix)

Julien(WeWorks)

Wes McKinney(Ursa Labs)

Gidon Gershinsky(IBM)

Steven(?)

Anikt(?)

Deepak(?)

Xinli Shang(Uber)


Topics:

   1.

   Key signing issue
   1.

      Zoltan/Julien/Ryan:
      1.

         We already have email exchange of this issue.
         2.

         In the past, it is done in person. But it is OK to sign each other
         via video conference. We can do a video session of signing keys.
         3.

         It is painful to do this every release



   1.

   Column Encryption
   1.

      Gidon:
      1.

         C++ version progress well. It is pretty much done.
         2.

         Wait for Parquet-1.11.0 release to send out code review
         3.

         Found issues in Java. Worked around it. Will talk to Java
         community.
         2.

      Xinli:
      1.

         On top of Gidon’s change, we introduced a plugin/interface to
         Parquet to activate encryption and build up encryption properties.
         Currently, we implement its schema driven implementation, but
it can be
         implemented in another way too. I will send out the design soon.
         3.

      Gidon:
      1.

         Overall we took a bottom-up approach. We might need another layer
         on top of these to make the adoption easier.
         4.

      Ryan:
      1.

         Different companies can have a different implementation. It is
         good to have a plugin mode.
         5.

      Brian: Question of the key metadata, KMS.
      1.

         Currently, Parquet designs it as a byte array. Depending on the
         implementation, it can be used to record the KMS/Key Metadata.
         2.

   Parquet-1.11.0 Release Validation
   1.

      Ryan
      1.

         Validate the write path of column index - We need to test the
         calculation is correct; Validation is independent. Ryan will
take this task.
         2.

      Brian:
      1.

         Can help some testing in Summer if needed.
         3.

      Steven:
      1.

         What is the test strategy, any fuzzing test?
         4.

      Ryan:
      1.

         We have some random test but not reliable. Inside Netflix, we have
         stable fuzzing test. May need to port some to Parquet.
         5.

      Xinli:
      1.

         We have run a lot of regression test on Parquet-1.11.0. We add
         encryption code on top of 1.11.0 and run a lot of tests. No
new feature
         test of 1.110 but existing features tests are so far so good.
Let us know
         if you want us to add some more tests into our test suite.



   1.

   Remove old Parquet modules


   1.

   Ryan
   1.

      We should remove those old modules if they are not needed
      2.

      Hive module - Seems not used
      3.

      Scrooge module - if it is only used by one company, we might not want
      to maintain it
      4.

      Does anybody still use parquet-tools instead of parquet-cli? Maybe we
      can mark it as deprecated.
      5.

      Open a Jira ticket for it.
      2.

   Julien
   1.

      Twitter may use it. Julien will check with Twitter.
      2.

      We should communicate widely.


-- 
Xinli Shang (Uber)

Reply via email to