Hi all,

This is to follow up of the meeting notes below. I created Jira ticket
PARQUET-1396 <https://issues.apache.org/jira/browse/PARQUET-1396> and the
design can be found here
<https://docs.google.com/document/d/17GTQAezl1ZC1pMNHjYU_bPVxMU6DIPjtXOiLclXUlyA>.
The recorded video <https://www.youtube.com/watch?v=W38CrTUJ3YM> in Hadoop
Contributor Meetup can also help reading the design. Please share your
feedback by commenting on the design doc.


   1. On top of Gidon’s change, we introduced a plugin/interface to Parquet
   to activate encryption and build up encryption properties. Currently, we
   implement its schema driven implementation, but it can be implemented in
   another way too. I will send out the design soon.


Xinli

On Tue, Apr 30, 2019 at 12:30 PM Xinli shang <[email protected]> wrote:

> 4/30/2019
>
> Attendee:
>
> Zoltan and Several other folks(Cloudera)
>
> Brian (SaS?)
>
> Ryan Blue(Netflix)
>
> Julien(WeWorks)
>
> Wes McKinney(Ursa Labs)
>
> Gidon Gershinsky(IBM)
>
> Steven(?)
>
> Anikt(?)
>
> Deepak(?)
>
> Xinli Shang(Uber)
>
>
> Topics:
>
>    1.
>
>    Key signing issue
>    1.
>
>       Zoltan/Julien/Ryan:
>       1.
>
>          We already have email exchange of this issue.
>          2.
>
>          In the past, it is done in person. But it is OK to sign each
>          other via video conference. We can do a video session of signing 
> keys.
>          3.
>
>          It is painful to do this every release
>
>
>
>    1.
>
>    Column Encryption
>    1.
>
>       Gidon:
>       1.
>
>          C++ version progress well. It is pretty much done.
>          2.
>
>          Wait for Parquet-1.11.0 release to send out code review
>          3.
>
>          Found issues in Java. Worked around it. Will talk to Java
>          community.
>          2.
>
>       Xinli:
>       1.
>
>          On top of Gidon’s change, we introduced a plugin/interface to
>          Parquet to activate encryption and build up encryption properties.
>          Currently, we implement its schema driven implementation, but it can 
> be
>          implemented in another way too. I will send out the design soon.
>          3.
>
>       Gidon:
>       1.
>
>          Overall we took a bottom-up approach. We might need another
>          layer on top of these to make the adoption easier.
>          4.
>
>       Ryan:
>       1.
>
>          Different companies can have a different implementation. It is
>          good to have a plugin mode.
>          5.
>
>       Brian: Question of the key metadata, KMS.
>       1.
>
>          Currently, Parquet designs it as a byte array. Depending on the
>          implementation, it can be used to record the KMS/Key Metadata.
>          2.
>
>    Parquet-1.11.0 Release Validation
>    1.
>
>       Ryan
>       1.
>
>          Validate the write path of column index - We need to test the
>          calculation is correct; Validation is independent. Ryan will take 
> this task.
>          2.
>
>       Brian:
>       1.
>
>          Can help some testing in Summer if needed.
>          3.
>
>       Steven:
>       1.
>
>          What is the test strategy, any fuzzing test?
>          4.
>
>       Ryan:
>       1.
>
>          We have some random test but not reliable. Inside Netflix, we
>          have stable fuzzing test. May need to port some to Parquet.
>          5.
>
>       Xinli:
>       1.
>
>          We have run a lot of regression test on Parquet-1.11.0. We add
>          encryption code on top of 1.11.0 and run a lot of tests. No new 
> feature
>          test of 1.110 but existing features tests are so far so good. Let us 
> know
>          if you want us to add some more tests into our test suite.
>
>
>
>    1.
>
>    Remove old Parquet modules
>
>
>    1.
>
>    Ryan
>    1.
>
>       We should remove those old modules if they are not needed
>       2.
>
>       Hive module - Seems not used
>       3.
>
>       Scrooge module - if it is only used by one company, we might not
>       want to maintain it
>       4.
>
>       Does anybody still use parquet-tools instead of parquet-cli? Maybe
>       we can mark it as deprecated.
>       5.
>
>       Open a Jira ticket for it.
>       2.
>
>    Julien
>    1.
>
>       Twitter may use it. Julien will check with Twitter.
>       2.
>
>       We should communicate widely.
>
>
> --
> Xinli Shang (Uber)
>


-- 
Xinli Shang

Reply via email to