In terms of reaching a decision on any code or design changes, including this one, I'd suggest going without formal votes. Voting process for code modifications between choices A and B doesn't necessarily end with a decision A or B -- a single (qualified) -1 vote is a veto and cannot be overridden [1]. Said differently, the guideline is that code changes should be made by consensus; not by one group outvoting another. I'd like to avoid setting such precedent; we should try to drive consensus, as opposed to attempting to outvote another part of the community.
In this particular case, we have had a great discussion. Many contributors brought different perspectives. Consequently, some opinions have been likely changed. At this point, someone should summarize the arguments, try to critique them from a neutral standpoint, and suggest a refined proposal that takes these perspectives into account. If nobody objects in a short time, we should consider this decided. [ I can certainly help here, but I'd love to see somebody else do it! ] [1] http://www.apache.org/foundation/voting.html On Wed, Oct 26, 2016 at 7:35 AM, Ben Chambers <[email protected]> wrote: > I also like Distinct since it doesn't make it sound like it modifies any > underlying collection. RemoveDuplicates makes it sound like the duplicates > are removed, rather than a new PCollection without duplicates being > returned. > > On Wed, Oct 26, 2016, 7:36 AM Jean-Baptiste Onofré <[email protected]> > wrote: > > > Agree. It was more a transition proposal. > > > > Regards > > JB > > > > > > > > On Oct 26, 2016, 08:31, at 08:31, Robert Bradshaw > > <[email protected]> wrote: > > >On Mon, Oct 24, 2016 at 11:02 PM, Jean-Baptiste Onofré > > ><[email protected]> wrote: > > >> And what about use RemoveDuplicates and create an alias Distinct ? > > > > > >I'd really like to avoid (long term) aliases--you end up having to > > >document (and maintain) them both, and it adds confusion as to which > > >one to use (especially if they every diverge), and means searching for > > >one or the other yields half the results. > > > > > >> It doesn't break the API and would address both SQL users and more > > >"big data" users. > > >> > > >> My $0.01 ;) > > >> > > >> Regards > > >> JB > > >> > > >> > > >> > > >> On Oct 24, 2016, 22:23, at 22:23, Dan Halperin > > ><[email protected]> wrote: > > >>>I find "MakeDistinct" more confusing. My votes in decreasing > > >>>preference: > > >>> > > >>>1. Keep `RemoveDuplicates` name, ensure that important keywords are > > >in > > >>>the > > >>>Javadoc. This reduces churn on our users and is honestly pretty dang > > >>> descriptive. > > >>>2. Rename to `Distinct`, which is clear if you're a SQL user and > > >likely > > >>>less clear otherwise. This is a backwards-incompatible API change, so > > >>>we > > >>>should do it before we go stable. > > >>> > > >>>I am not super strong that 1 > 2, but I am very strong that > > >"Distinct" > > >>>>>> > > >>>"MakeDistinct" or and "RemoveDuplicates" >>> "AvoidDuplicate". > > >>> > > >>>Dan > > >>> > > >>>On Mon, Oct 24, 2016 at 10:12 AM, Kenneth Knowles > > >>><[email protected]> > > >>>wrote: > > >>> > > >>>> The precedent that we use verbs has many exceptions. We have > > >>>> ApproximateQuantiles, Values, Keys, WithTimestamps, and I would > > >even > > >>>> include Sum (at least when I read it). > > >>>> > > >>>> Historical note: the predilection towards verbs is from the Google > > >>>Style > > >>>> Guide for Java method names > > >>>> > > >>><https://google.github.io/styleguide/javaguide.html#s5. > 2.3-method-names > > >, > > >>>> which states "Method names are typically verbs or verb phrases". > > >But > > >>>even > > >>>> in Google code there are lots of exceptions when it makes sense, > > >like > > >>>> Guava's > > >>>> Iterables.any(), Iterables.all(), Iterables.toArray(), the entire > > >>>> Predicates module, etc. Just an aside; Beam isn't Google code. I > > >>>suggest we > > >>>> use our judgment rather than a policy. > > >>>> > > >>>> I think "Distinct" is one of those exceptions. It is a standard > > >>>widespread > > >>>> name and also reads better as an adjective. I prefer it, but also > > >>>don't > > >>>> care strongly enough to change it or to change it back :-) > > >>>> > > >>>> If we must have a verb, I like it as-is more than MakeDistinct and > > >>>> AvoidDuplicate. > > >>>> > > >>>> On Mon, Oct 24, 2016 at 9:46 AM Jesse Anderson > > >>><[email protected]> > > >>>> wrote: > > >>>> > > >>>> > My original thought for this change was that Crunch uses the > > >class > > >>>name > > >>>> > Distinct. SQL also uses the keyword distinct. > > >>>> > > > >>>> > Maybe the rule should be changed to adjectives or verbs depending > > >>>on the > > >>>> > context. > > >>>> > > > >>>> > Using a verb to describe this class really doesn't connote what > > >the > > >>>class > > >>>> > does as succinctly as the adjective. > > >>>> > > > >>>> > On Mon, Oct 24, 2016 at 9:40 AM Neelesh Salian > > >>><[email protected]> > > >>>> > wrote: > > >>>> > > > >>>> > > Hello, > > >>>> > > > > >>>> > > First of all, thank you to Daniel, Robert and Jesse for their > > >>>review on > > >>>> > > this: https://issues.apache.org/jira/browse/BEAM-239 > > >>>> > > > > >>>> > > A point that came up was using verbs explicitly for Transforms. > > >>>> > > Here is the PR: > > >>>https://github.com/apache/incubator-beam/pull/1164 > > >>>> > > > > >>>> > > Posting it to help understand if we have a consensus for it and > > >>>if yes, > > >>>> > we > > >>>> > > could perhaps document it for future changes. > > >>>> > > > > >>>> > > Thank you. > > >>>> > > > > >>>> > > -- > > >>>> > > Neelesh Srinivas Salian > > >>>> > > Engineer > > >>>> > > > > >>>> > > > >>>> > > >
