Thanks everyone for all the inputs. It's really encouraging for a new contributor, as myself, to get valuable input and mentoring (like on this thread) and, in turn, help make the community better.
On Thu, Oct 27, 2016 at 11:41 AM, Jean-Baptiste Onofré <[email protected]> wrote: > You did well ! It's an interesting discussion we have and it's great to > have it on the mailing list (better than in Jira or PR comments IMHO). > > Thanks ! > > Regards > JB > > > > On Oct 27, 2016, 20:39, at 20:39, Robert Bradshaw > <[email protected]> wrote: > >+1 to all Dan says. > > > >I only brought this up because it seemed new contributors (yay) > >jumping in and renaming a core transform based on "Something to > >consider" deserved a couple more more eyeballs, but didn't intend for > >it to become a big deal. > > > >On Thu, Oct 27, 2016 at 11:03 AM, Dan Halperin > ><[email protected]> wrote: > >> Folks, I don't think this needs to be a "vote". This is just not that > >big a > >> deal :). It is important to be transparent and have these discussions > >on > >> the list, which is why we brought it here from GitHub/JIRA, but at > >the end > >> of the day I hope that a small group of committers and developers can > >> assess "good enough" consensus for these minor issues. > >> > >> Here's my assessment: > >> * We don't really have any rules about naming transforms. "Should be > >a > >> verb" is a sort of guiding principle inherited from the Google Flume > >> project from which Dataflow evolved, but honestly we violate this > >rule for > >> clarity all over the place. ("Values", for example). > >> * The "Big Data" community is significantly more familiar with the > >concept > >> of Distinct -- Jesse, who filed the original JIRA, is a good example > >here. > >> * Finally, nobody feels very strongly. We could argue minor points of > >each > >> solution, but at the end of the day I don't think anyone wants to > >block a > >> change. > >> > >> Let's go with Distinct. It's important to align Beam with the open > >source > >> big data community. (And thanks Jesse, our newest (*tied) committer, > >for > >> pushing us in the right direction!) > >> > >> Jesse, can you please take charge of wrapping up the PR and merging > >it? > >> > >> Thanks! > >> Dan > >> > >> On Wed, Oct 26, 2016 at 11:12 PM, Jean-Baptiste Onofré > ><[email protected]> > >> wrote: > >> > >>> Just to clarify. Davor is right for a code modification change: -1 > >means a > >>> veto. > >>> I meant that -1 is not a veto for a release vote. > >>> > >>> Anyway, even if it's not a formal code, we can have a discussion > >with > >>> "options" a,b and c. > >>> > >>> Regards > >>> JB > >>> > >>> > >>> > >>> On Oct 27, 2016, 06:48, at 06:48, Davor Bonaci > ><[email protected]> > >>> wrote: > >>> >In terms of reaching a decision on any code or design changes, > >>> >including > >>> >this one, I'd suggest going without formal votes. Voting process > >for > >>> >code > >>> >modifications between choices A and B doesn't necessarily end with > >a > >>> >decision A or B -- a single (qualified) -1 vote is a veto and > >cannot be > >>> >overridden [1]. Said differently, the guideline is that code > >changes > >>> >should > >>> >be made by consensus; not by one group outvoting another. I'd like > >to > >>> >avoid > >>> >setting such precedent; we should try to drive consensus, as > >opposed to > >>> >attempting to outvote another part of the community. > >>> > > >>> >In this particular case, we have had a great discussion. Many > >>> >contributors > >>> >brought different perspectives. Consequently, some opinions have > >been > >>> >likely changed. At this point, someone should summarize the > >arguments, > >>> >try > >>> >to critique them from a neutral standpoint, and suggest a refined > >>> >proposal > >>> >that takes these perspectives into account. If nobody objects in a > >>> >short > >>> >time, we should consider this decided. [ I can certainly help here, > >but > >>> >I'd > >>> >love to see somebody else do it! ] > >>> > > >>> >[1] http://www.apache.org/foundation/voting.html > >>> > > >>> >On Wed, Oct 26, 2016 at 7:35 AM, Ben Chambers > >>> ><[email protected]> > >>> >wrote: > >>> > > >>> >> I also like Distinct since it doesn't make it sound like it > >modifies > >>> >any > >>> >> underlying collection. RemoveDuplicates makes it sound like the > >>> >duplicates > >>> >> are removed, rather than a new PCollection without duplicates > >being > >>> >> returned. > >>> >> > >>> >> On Wed, Oct 26, 2016, 7:36 AM Jean-Baptiste Onofré > ><[email protected]> > >>> >> wrote: > >>> >> > >>> >> > Agree. It was more a transition proposal. > >>> >> > > >>> >> > Regards > >>> >> > JB > >>> >> > > >>> >> > > >>> >> > > >>> >> > On Oct 26, 2016, 08:31, at 08:31, Robert Bradshaw > >>> >> > <[email protected]> wrote: > >>> >> > >On Mon, Oct 24, 2016 at 11:02 PM, Jean-Baptiste Onofré > >>> >> > ><[email protected]> wrote: > >>> >> > >> And what about use RemoveDuplicates and create an alias > >Distinct > >>> >? > >>> >> > > > >>> >> > >I'd really like to avoid (long term) aliases--you end up > >having to > >>> >> > >document (and maintain) them both, and it adds confusion as to > >>> >which > >>> >> > >one to use (especially if they every diverge), and means > >searching > >>> >for > >>> >> > >one or the other yields half the results. > >>> >> > > > >>> >> > >> It doesn't break the API and would address both SQL users > >and > >>> >more > >>> >> > >"big data" users. > >>> >> > >> > >>> >> > >> My $0.01 ;) > >>> >> > >> > >>> >> > >> Regards > >>> >> > >> JB > >>> >> > >> > >>> >> > >> > >>> >> > >> > >>> >> > >> On Oct 24, 2016, 22:23, at 22:23, Dan Halperin > >>> >> > ><[email protected]> wrote: > >>> >> > >>>I find "MakeDistinct" more confusing. My votes in decreasing > >>> >> > >>>preference: > >>> >> > >>> > >>> >> > >>>1. Keep `RemoveDuplicates` name, ensure that important > >keywords > >>> >are > >>> >> > >in > >>> >> > >>>the > >>> >> > >>>Javadoc. This reduces churn on our users and is honestly > >pretty > >>> >dang > >>> >> > >>> descriptive. > >>> >> > >>>2. Rename to `Distinct`, which is clear if you're a SQL user > >and > >>> >> > >likely > >>> >> > >>>less clear otherwise. This is a backwards-incompatible API > >>> >change, so > >>> >> > >>>we > >>> >> > >>>should do it before we go stable. > >>> >> > >>> > >>> >> > >>>I am not super strong that 1 > 2, but I am very strong that > >>> >> > >"Distinct" > >>> >> > >>>>>> > >>> >> > >>>"MakeDistinct" or and "RemoveDuplicates" >>> > >"AvoidDuplicate". > >>> >> > >>> > >>> >> > >>>Dan > >>> >> > >>> > >>> >> > >>>On Mon, Oct 24, 2016 at 10:12 AM, Kenneth Knowles > >>> >> > >>><[email protected]> > >>> >> > >>>wrote: > >>> >> > >>> > >>> >> > >>>> The precedent that we use verbs has many exceptions. We > >have > >>> >> > >>>> ApproximateQuantiles, Values, Keys, WithTimestamps, and I > >>> >would > >>> >> > >even > >>> >> > >>>> include Sum (at least when I read it). > >>> >> > >>>> > >>> >> > >>>> Historical note: the predilection towards verbs is from > >the > >>> >Google > >>> >> > >>>Style > >>> >> > >>>> Guide for Java method names > >>> >> > >>>> > >>> >> > >>><https://google.github.io/styleguide/javaguide.html#s5. > >>> >> 2.3-method-names > >>> >> > >, > >>> >> > >>>> which states "Method names are typically verbs or verb > >>> >phrases". > >>> >> > >But > >>> >> > >>>even > >>> >> > >>>> in Google code there are lots of exceptions when it makes > >>> >sense, > >>> >> > >like > >>> >> > >>>> Guava's > >>> >> > >>>> Iterables.any(), Iterables.all(), Iterables.toArray(), the > >>> >entire > >>> >> > >>>> Predicates module, etc. Just an aside; Beam isn't Google > >code. > >>> >I > >>> >> > >>>suggest we > >>> >> > >>>> use our judgment rather than a policy. > >>> >> > >>>> > >>> >> > >>>> I think "Distinct" is one of those exceptions. It is a > >>> >standard > >>> >> > >>>widespread > >>> >> > >>>> name and also reads better as an adjective. I prefer it, > >but > >>> >also > >>> >> > >>>don't > >>> >> > >>>> care strongly enough to change it or to change it back :-) > >>> >> > >>>> > >>> >> > >>>> If we must have a verb, I like it as-is more than > >MakeDistinct > >>> >and > >>> >> > >>>> AvoidDuplicate. > >>> >> > >>>> > >>> >> > >>>> On Mon, Oct 24, 2016 at 9:46 AM Jesse Anderson > >>> >> > >>><[email protected]> > >>> >> > >>>> wrote: > >>> >> > >>>> > >>> >> > >>>> > My original thought for this change was that Crunch uses > >the > >>> >> > >class > >>> >> > >>>name > >>> >> > >>>> > Distinct. SQL also uses the keyword distinct. > >>> >> > >>>> > > >>> >> > >>>> > Maybe the rule should be changed to adjectives or verbs > >>> >depending > >>> >> > >>>on the > >>> >> > >>>> > context. > >>> >> > >>>> > > >>> >> > >>>> > Using a verb to describe this class really doesn't > >connote > >>> >what > >>> >> > >the > >>> >> > >>>class > >>> >> > >>>> > does as succinctly as the adjective. > >>> >> > >>>> > > >>> >> > >>>> > On Mon, Oct 24, 2016 at 9:40 AM Neelesh Salian > >>> >> > >>><[email protected]> > >>> >> > >>>> > wrote: > >>> >> > >>>> > > >>> >> > >>>> > > Hello, > >>> >> > >>>> > > > >>> >> > >>>> > > First of all, thank you to Daniel, Robert and Jesse > >for > >>> >their > >>> >> > >>>review on > >>> >> > >>>> > > this: https://issues.apache.org/jira/browse/BEAM-239 > >>> >> > >>>> > > > >>> >> > >>>> > > A point that came up was using verbs explicitly for > >>> >Transforms. > >>> >> > >>>> > > Here is the PR: > >>> >> > >>>https://github.com/apache/incubator-beam/pull/1164 > >>> >> > >>>> > > > >>> >> > >>>> > > Posting it to help understand if we have a consensus > >for > >>> >it and > >>> >> > >>>if yes, > >>> >> > >>>> > we > >>> >> > >>>> > > could perhaps document it for future changes. > >>> >> > >>>> > > > >>> >> > >>>> > > Thank you. > >>> >> > >>>> > > > >>> >> > >>>> > > -- > >>> >> > >>>> > > Neelesh Srinivas Salian > >>> >> > >>>> > > Engineer > >>> >> > >>>> > > > >>> >> > >>>> > > >>> >> > >>>> > >>> >> > > >>> >> > >>> > -- Neelesh Srinivas Salian Customer Operations Engineer
