Hi, Sorry, that was an error on my side, I suggested Nandor to add a TLDR section with this title. I agree with your comment, Wes, outcome would have been a better choice of word than decision.
Br, Zoltan On Fri, Aug 17, 2018 at 6:36 PM Wes McKinney <[email protected]> wrote: > hi Nandor, > > A fine detail, and I may be wrong, but I don't think decisions can > technically be made on a call because time zones do not permit > everyone to join always and not all collaborators are comfortable > having live discussions in English. see [1] > > You can present the consensus of the participants in the call summary > and others in the community have an opportunity to provide feedback. > The "decision" is therefore one based on lazy consensus thereafter if > there are no objections or follow up discussion > > - Wes > > [1]: https://www.apache.org/foundation/how-it-works.html#management > > On Fri, Aug 17, 2018 at 8:38 AM, Nandor Kollar > <[email protected]> wrote: > > Topics discussed and decisions (meeting held on 2018 August 15th, at > > 6pm CET / 9 am PST): > > > > - Aligning page row boundaries between different columns: Debated, > > please follow-up > > - Remove Java specific code from parquet-format: Accepted > > - Column encryption: Please review > > - Parquet-format release: Scope accepted > > - C++ mono-repo: Please vote > > > > > > > > Aligning page row boundaries between different columns (Gabor) > > -------------------------------------------------------------- > > > > Background: In the existing specification of column indexes, page > > boundaries are not aligned between different column in respect to row > > count. > > > > Gabor: implemented this logic, interested parties can review the code > here: > > - https://github.com/apache/parquet-mr/pull/509 > > - https://github.com/apache/parquet-mr/commits/column-indexes > > > > Main takeaway from implementation: > > > > - Index filtering logic as currently specified is overcomplicated. > > - May become a maintenance burden and results in steep learning curve > > for onboarding - new developers. > > - Can not be made transparent, vectorized readers (Hive, Spark) have > > to implement a similar logic. > > > > Suggestion: > > > > - Align page row boundaries between different columns, i.e. the n-th > > page of every column should contain the same number of rows. > > - Filtering logic would be a lot simpler. > > - Vectorized readers will get index-based filtering without any change > > required on their side. > > > > Response: > > - Ryan doesn't recommend it. Performance numbers? > > - Discuss offline or on dev mailing list > > - Timeline for reaching decision? Within a week. (Gabor already has a > > working implementation.) > > > > > > > > Remove Java specific code from parquet-format (Nandor) > > ------------------------------------------------------ > > > > Background: Parquet-format contains a few Java classes. Earlier no > > changes were required in these, but this has changed in recent > > features, especially with the new column encryption feature, which > > would add substantial new code. > > > > Suggestion (Nandor): Instead of cluttering parquet-format further with > > java-specific code, move these classes to parquet-mr and deprecate > > them in parquet-format. > > > > What is the motivation behind the status quo? Julien: We may need a > > different Thrift version in the parquet-thrift binding than in the > > parquet files themselves. If we move these classes to parquet-mr, we > > should shade thrift. Additionally, currently a thrift-compiler is only > > needed for parquet-format, not parquet-mr, this will change. Gabor: > > Dockerization may help. > > > > Julien: We could merge the two repos altogether as well. Gabor: This, > > however would move the specification into the Java implementation, > > which would be against the cross-language ideology, so let's keep the > > separate repo for the format. Zoltan: Other language binding should > > also consider directly using it instead of copying parquet.thrift into > > their source code. > > > > > > > > Column encryption (Gidon) > > ------------------------- > > > > Under development: > > - Key management API (doesn’t provide E2E key management) (PARQUET-1373) > > - Anonymization and data masking (PARQUET-1376) > > > > Java PRs under review: > > - https://github.com/apache/parquet-mr/pull/471 > > - https://github.com/apache/parquet-mr/pull/472 > > > > C++ PR: > > - https://github.com/apache/parquet-cpp/pull/475 > > > > > > We need more testing (both unit tests and interop tests between Java and > C++). > > > > > > > > Parquet-format release (Zoltan) > > ------------------------------- > > > > Suggested scope (Zoltan): > > - Column encryption > > - Nanosec precision > > - Anything else? > > > > Discussion: > > - Nothing else to add. > > - Wes welcomes the nano precision, will be needed in parquet-cpp as well. > > > > > > > > C++ mono-repo: merging Arrow and parquet-cpp (Wes) > > -------------------------------------------------- > > > > > > Background: duplicated CI system and codebase, circular dependencies > > between libraries > > > > Suggestion (Wes): move parquet-cpp into arrow codebase. Details can be > > read here: > https://lists.apache.org/thread.html/4bc135b4e933b959602df48bc3d5978ab7a4299d83d4295da9f498ac@%3Cdev.parquet.apache.org%3E > > > > > > Resolution: No objections but no final decision either, vote on the > > parquet list: > https://lists.apache.org/thread.html/53f77f9f1f04b97709a0286db1b73a49b7f1541d8f8b2cb32db5c922@%3Cdev.parquet.apache.org%3E >
