I agree with Andrew. Recent parquet specs changes have followed the same practice:
- https://lists.apache.org/thread/gyvqcx9ssxkjlrwogqwy7n4z6ofdm871 - https://lists.apache.org/thread/wgobz41mfldbhqpg9q4mdwypghg2cxg2 - https://lists.apache.org/thread/nlsj0ftxy7y4ov1678rgy5zc7dmogg6q On Tue, May 14, 2024 at 5:25 PM Andrew Lamb <andrewlam...@gmail.com> wrote: > > Would it be reasonable to say specification change requires > implementation > > in two parquet implementations within Apache Parquet project? > > I believe this approach is how the Apache Arrow project handles spec > changes[1] and that process has worked well in my opinion. > > Andrew > > [1] https://arrow.apache.org/docs/format/Changing.html > > On Tue, May 14, 2024 at 5:00 AM Rok Mihevc <rok.mih...@gmail.com> wrote: > > > Second Raphael's point. > > Would it be reasonable to say specification change requires > implementation > > in two parquet implementations within Apache Parquet project? > > > > Rok > > > > On Tue, May 14, 2024 at 10:50 AM Gang Wu <ust...@gmail.com> wrote: > > > > > IMHO, it looks more reasonable if a reference implementation is > required > > > to support most (not all) elements from the specification. > > > > > > Another question is: should we discuss (and vote for) each candidate > > > one by one? We can start with parquet-mr which is most well-known > > > implementation. > > > > > > Best, > > > Gang > > > > > > On Tue, May 14, 2024 at 4:41 PM Raphael Taylor-Davies > > > <r.taylordav...@googlemail.com.invalid> wrote: > > > > > > > Potentially it would be helpful to flip the question around. As > Andrew > > > > articulates, a reference implementation is required to implement all > > > > elements from the specification, and therefore the major consequence > of > > > > labeling parquet-mr thusly would be that any specification change > would > > > > have to be implemented within parquet-mr as part of the > standardisation > > > > process. It would be insufficient for it to be implemented in, for > > > > example, two of the parquet implementations maintained by the arrow > > > > project. I personally think that would be a shame and likely exclude > > > > many people who would otherwise be interested in evolving the parquet > > > > specification, but think that is at the core of this question. > > > > > > > > Kind Regards, > > > > > > > > Raphael > > > > > > > > On 13/05/2024 20:55, Andrew Lamb wrote: > > > > > Question: Should we label parquet-mr or any other parquet > > > implementations > > > > > "reference" implications"? > > > > > > > > > > This came up as part of Vinoo's great PR to list different parquet > > > > > reference implementations[1][2]. > > > > > > > > > > The term "reference implementation" often has an official > > connotation. > > > > For > > > > > example the wikipedia definition is "a program that implements all > > > > > requirements from a corresponding specification. The reference > > > > > implementation ... should be considered the "correct" behavior of > any > > > > other > > > > > implementation of it."[3] > > > > > > > > > > Given the close association of parquet-mr to the parquet standard, > it > > > is > > > > a > > > > > natural candidate to label as "reference implementation." However, > it > > > is > > > > > not clear to me if there is consensus that it should be thusly > > labeled. > > > > > > > > > > I have a strong opinion that a consensus on this question would be > > very > > > > > helpful. I don't actually have a strong opinion about the answer > > > > > > > > > > Andrew > > > > > > > > > > > > > > > > > > > > [1]: > > > > > https://github.com/apache/parquet-site/pull/53#discussion_r1582882267 > > > > > [2]: > > > > > https://github.com/apache/parquet-site/pull/53#discussion_r1598283465 > > > > > [3]: https://en.wikipedia.org/wiki/Reference_implementation > > > > > > > > > > > > > > >