In terms of reaching a decision on any code or design changes, including
this one, I'd suggest going without formal votes. Voting process for code
modifications between choices A and B doesn't necessarily end with a
decision A or B -- a single (qualified) -1 vote is a veto and cannot be
overridden [1]. Said differently, the guideline is that code changes should
be made by consensus; not by one group outvoting another. I'd like to avoid
setting such precedent; we should try to drive consensus, as opposed to
attempting to outvote another part of the community.

In this particular case, we have had a great discussion. Many contributors
brought different perspectives. Consequently, some opinions have been
likely changed. At this point, someone should summarize the arguments, try
to critique them from a neutral standpoint, and suggest a refined proposal
that takes these perspectives into account. If nobody objects in a short
time, we should consider this decided. [ I can certainly help here, but I'd
love to see somebody else do it! ]

[1] http://www.apache.org/foundation/voting.html

On Wed, Oct 26, 2016 at 7:35 AM, Ben Chambers <[email protected]>
wrote:

> I also like Distinct since it doesn't make it sound like it modifies any
> underlying collection. RemoveDuplicates makes it sound like the duplicates
> are removed, rather than a new PCollection without duplicates being
> returned.
>
> On Wed, Oct 26, 2016, 7:36 AM Jean-Baptiste Onofré <[email protected]>
> wrote:
>
> > Agree. It was more a transition proposal.
> >
> > Regards
> > JB
> >
> > ⁣​
> >
> > On Oct 26, 2016, 08:31, at 08:31, Robert Bradshaw
> > <[email protected]> wrote:
> > >On Mon, Oct 24, 2016 at 11:02 PM, Jean-Baptiste Onofré
> > ><[email protected]> wrote:
> > >> And what about use RemoveDuplicates and create an alias Distinct ?
> > >
> > >I'd really like to avoid (long term) aliases--you end up having to
> > >document (and maintain) them both, and it adds confusion as to which
> > >one to use (especially if they every diverge), and means searching for
> > >one or the other yields half the results.
> > >
> > >> It doesn't break the API and would address both SQL users and more
> > >"big data" users.
> > >>
> > >> My $0.01 ;)
> > >>
> > >> Regards
> > >> JB
> > >>
> > >> ⁣
> > >>
> > >> On Oct 24, 2016, 22:23, at 22:23, Dan Halperin
> > ><[email protected]> wrote:
> > >>>I find "MakeDistinct" more confusing. My votes in decreasing
> > >>>preference:
> > >>>
> > >>>1. Keep `RemoveDuplicates` name, ensure that important keywords are
> > >in
> > >>>the
> > >>>Javadoc. This reduces churn on our users and is honestly pretty dang
> > >>> descriptive.
> > >>>2. Rename to `Distinct`, which is clear if you're a SQL user and
> > >likely
> > >>>less clear otherwise. This is a backwards-incompatible API change, so
> > >>>we
> > >>>should do it before we go stable.
> > >>>
> > >>>I am not super strong that 1 > 2, but I am very strong that
> > >"Distinct"
> > >>>>>>
> > >>>"MakeDistinct" or and "RemoveDuplicates" >>> "AvoidDuplicate".
> > >>>
> > >>>Dan
> > >>>
> > >>>On Mon, Oct 24, 2016 at 10:12 AM, Kenneth Knowles
> > >>><[email protected]>
> > >>>wrote:
> > >>>
> > >>>> The precedent that we use verbs has many exceptions. We have
> > >>>> ApproximateQuantiles, Values, Keys, WithTimestamps, and I would
> > >even
> > >>>> include Sum (at least when I read it).
> > >>>>
> > >>>> Historical note: the predilection towards verbs is from the Google
> > >>>Style
> > >>>> Guide for Java method names
> > >>>>
> > >>><https://google.github.io/styleguide/javaguide.html#s5.
> 2.3-method-names
> > >,
> > >>>> which states "Method names are typically verbs or verb phrases".
> > >But
> > >>>even
> > >>>> in Google code there are lots of exceptions when it makes sense,
> > >like
> > >>>> Guava's
> > >>>> Iterables.any(), Iterables.all(), Iterables.toArray(), the entire
> > >>>> Predicates module, etc. Just an aside; Beam isn't Google code. I
> > >>>suggest we
> > >>>> use our judgment rather than a policy.
> > >>>>
> > >>>> I think "Distinct" is one of those exceptions. It is a standard
> > >>>widespread
> > >>>> name and also reads better as an adjective. I prefer it, but also
> > >>>don't
> > >>>> care strongly enough to change it or to change it back :-)
> > >>>>
> > >>>> If we must have a verb, I like it as-is more than MakeDistinct and
> > >>>> AvoidDuplicate.
> > >>>>
> > >>>> On Mon, Oct 24, 2016 at 9:46 AM Jesse Anderson
> > >>><[email protected]>
> > >>>> wrote:
> > >>>>
> > >>>> > My original thought for this change was that Crunch uses the
> > >class
> > >>>name
> > >>>> > Distinct. SQL also uses the keyword distinct.
> > >>>> >
> > >>>> > Maybe the rule should be changed to adjectives or verbs depending
> > >>>on the
> > >>>> > context.
> > >>>> >
> > >>>> > Using a verb to describe this class really doesn't connote what
> > >the
> > >>>class
> > >>>> > does as succinctly as the adjective.
> > >>>> >
> > >>>> > On Mon, Oct 24, 2016 at 9:40 AM Neelesh Salian
> > >>><[email protected]>
> > >>>> > wrote:
> > >>>> >
> > >>>> > > Hello,
> > >>>> > >
> > >>>> > > First of all, thank you to Daniel, Robert and Jesse for their
> > >>>review on
> > >>>> > > this: https://issues.apache.org/jira/browse/BEAM-239
> > >>>> > >
> > >>>> > > A point that came up was using verbs explicitly for Transforms.
> > >>>> > > Here is the PR:
> > >>>https://github.com/apache/incubator-beam/pull/1164
> > >>>> > >
> > >>>> > > Posting it to help understand if we have a consensus for it and
> > >>>if yes,
> > >>>> > we
> > >>>> > > could perhaps document it for future changes.
> > >>>> > >
> > >>>> > > Thank you.
> > >>>> > >
> > >>>> > > --
> > >>>> > > Neelesh Srinivas Salian
> > >>>> > > Engineer
> > >>>> > >
> > >>>> >
> > >>>>
> >
>

Reply via email to