Hi all, This is to follow up of the meeting notes below. I created Jira ticket PARQUET-1396 <https://issues.apache.org/jira/browse/PARQUET-1396> and the design can be found here <https://docs.google.com/document/d/17GTQAezl1ZC1pMNHjYU_bPVxMU6DIPjtXOiLclXUlyA>. The recorded video <https://www.youtube.com/watch?v=W38CrTUJ3YM> in Hadoop Contributor Meetup can also help reading the design. Please share your feedback by commenting on the design doc.
1. On top of Gidon’s change, we introduced a plugin/interface to Parquet to activate encryption and build up encryption properties. Currently, we implement its schema driven implementation, but it can be implemented in another way too. I will send out the design soon. Xinli On Tue, Apr 30, 2019 at 12:30 PM Xinli shang <[email protected]> wrote: > 4/30/2019 > > Attendee: > > Zoltan and Several other folks(Cloudera) > > Brian (SaS?) > > Ryan Blue(Netflix) > > Julien(WeWorks) > > Wes McKinney(Ursa Labs) > > Gidon Gershinsky(IBM) > > Steven(?) > > Anikt(?) > > Deepak(?) > > Xinli Shang(Uber) > > > Topics: > > 1. > > Key signing issue > 1. > > Zoltan/Julien/Ryan: > 1. > > We already have email exchange of this issue. > 2. > > In the past, it is done in person. But it is OK to sign each > other via video conference. We can do a video session of signing > keys. > 3. > > It is painful to do this every release > > > > 1. > > Column Encryption > 1. > > Gidon: > 1. > > C++ version progress well. It is pretty much done. > 2. > > Wait for Parquet-1.11.0 release to send out code review > 3. > > Found issues in Java. Worked around it. Will talk to Java > community. > 2. > > Xinli: > 1. > > On top of Gidon’s change, we introduced a plugin/interface to > Parquet to activate encryption and build up encryption properties. > Currently, we implement its schema driven implementation, but it can > be > implemented in another way too. I will send out the design soon. > 3. > > Gidon: > 1. > > Overall we took a bottom-up approach. We might need another > layer on top of these to make the adoption easier. > 4. > > Ryan: > 1. > > Different companies can have a different implementation. It is > good to have a plugin mode. > 5. > > Brian: Question of the key metadata, KMS. > 1. > > Currently, Parquet designs it as a byte array. Depending on the > implementation, it can be used to record the KMS/Key Metadata. > 2. > > Parquet-1.11.0 Release Validation > 1. > > Ryan > 1. > > Validate the write path of column index - We need to test the > calculation is correct; Validation is independent. Ryan will take > this task. > 2. > > Brian: > 1. > > Can help some testing in Summer if needed. > 3. > > Steven: > 1. > > What is the test strategy, any fuzzing test? > 4. > > Ryan: > 1. > > We have some random test but not reliable. Inside Netflix, we > have stable fuzzing test. May need to port some to Parquet. > 5. > > Xinli: > 1. > > We have run a lot of regression test on Parquet-1.11.0. We add > encryption code on top of 1.11.0 and run a lot of tests. No new > feature > test of 1.110 but existing features tests are so far so good. Let us > know > if you want us to add some more tests into our test suite. > > > > 1. > > Remove old Parquet modules > > > 1. > > Ryan > 1. > > We should remove those old modules if they are not needed > 2. > > Hive module - Seems not used > 3. > > Scrooge module - if it is only used by one company, we might not > want to maintain it > 4. > > Does anybody still use parquet-tools instead of parquet-cli? Maybe > we can mark it as deprecated. > 5. > > Open a Jira ticket for it. > 2. > > Julien > 1. > > Twitter may use it. Julien will check with Twitter. > 2. > > We should communicate widely. > > > -- > Xinli Shang (Uber) > -- Xinli Shang
