Thanks everyone. The PR was merged.
On Thu, Oct 27, 2016 at 11:51 AM, Neelesh Salian <[email protected]> wrote: > Thanks everyone for all the inputs. > It's really encouraging for a new contributor, as myself, to get valuable > input and mentoring (like on this thread) and, in turn, help make the > community better. > > > > On Thu, Oct 27, 2016 at 11:41 AM, Jean-Baptiste Onofré <[email protected]> > wrote: > >> You did well ! It's an interesting discussion we have and it's great to >> have it on the mailing list (better than in Jira or PR comments IMHO). >> >> Thanks ! >> >> Regards >> JB >> >> >> >> On Oct 27, 2016, 20:39, at 20:39, Robert Bradshaw >> <[email protected]> wrote: >> >+1 to all Dan says. >> > >> >I only brought this up because it seemed new contributors (yay) >> >jumping in and renaming a core transform based on "Something to >> >consider" deserved a couple more more eyeballs, but didn't intend for >> >it to become a big deal. >> > >> >On Thu, Oct 27, 2016 at 11:03 AM, Dan Halperin >> ><[email protected]> wrote: >> >> Folks, I don't think this needs to be a "vote". This is just not that >> >big a >> >> deal :). It is important to be transparent and have these discussions >> >on >> >> the list, which is why we brought it here from GitHub/JIRA, but at >> >the end >> >> of the day I hope that a small group of committers and developers can >> >> assess "good enough" consensus for these minor issues. >> >> >> >> Here's my assessment: >> >> * We don't really have any rules about naming transforms. "Should be >> >a >> >> verb" is a sort of guiding principle inherited from the Google Flume >> >> project from which Dataflow evolved, but honestly we violate this >> >rule for >> >> clarity all over the place. ("Values", for example). >> >> * The "Big Data" community is significantly more familiar with the >> >concept >> >> of Distinct -- Jesse, who filed the original JIRA, is a good example >> >here. >> >> * Finally, nobody feels very strongly. We could argue minor points of >> >each >> >> solution, but at the end of the day I don't think anyone wants to >> >block a >> >> change. >> >> >> >> Let's go with Distinct. It's important to align Beam with the open >> >source >> >> big data community. (And thanks Jesse, our newest (*tied) committer, >> >for >> >> pushing us in the right direction!) >> >> >> >> Jesse, can you please take charge of wrapping up the PR and merging >> >it? >> >> >> >> Thanks! >> >> Dan >> >> >> >> On Wed, Oct 26, 2016 at 11:12 PM, Jean-Baptiste Onofré >> ><[email protected]> >> >> wrote: >> >> >> >>> Just to clarify. Davor is right for a code modification change: -1 >> >means a >> >>> veto. >> >>> I meant that -1 is not a veto for a release vote. >> >>> >> >>> Anyway, even if it's not a formal code, we can have a discussion >> >with >> >>> "options" a,b and c. >> >>> >> >>> Regards >> >>> JB >> >>> >> >>> >> >>> >> >>> On Oct 27, 2016, 06:48, at 06:48, Davor Bonaci >> ><[email protected]> >> >>> wrote: >> >>> >In terms of reaching a decision on any code or design changes, >> >>> >including >> >>> >this one, I'd suggest going without formal votes. Voting process >> >for >> >>> >code >> >>> >modifications between choices A and B doesn't necessarily end with >> >a >> >>> >decision A or B -- a single (qualified) -1 vote is a veto and >> >cannot be >> >>> >overridden [1]. Said differently, the guideline is that code >> >changes >> >>> >should >> >>> >be made by consensus; not by one group outvoting another. I'd like >> >to >> >>> >avoid >> >>> >setting such precedent; we should try to drive consensus, as >> >opposed to >> >>> >attempting to outvote another part of the community. >> >>> > >> >>> >In this particular case, we have had a great discussion. Many >> >>> >contributors >> >>> >brought different perspectives. Consequently, some opinions have >> >been >> >>> >likely changed. At this point, someone should summarize the >> >arguments, >> >>> >try >> >>> >to critique them from a neutral standpoint, and suggest a refined >> >>> >proposal >> >>> >that takes these perspectives into account. If nobody objects in a >> >>> >short >> >>> >time, we should consider this decided. [ I can certainly help here, >> >but >> >>> >I'd >> >>> >love to see somebody else do it! ] >> >>> > >> >>> >[1] http://www.apache.org/foundation/voting.html >> >>> > >> >>> >On Wed, Oct 26, 2016 at 7:35 AM, Ben Chambers >> >>> ><[email protected]> >> >>> >wrote: >> >>> > >> >>> >> I also like Distinct since it doesn't make it sound like it >> >modifies >> >>> >any >> >>> >> underlying collection. RemoveDuplicates makes it sound like the >> >>> >duplicates >> >>> >> are removed, rather than a new PCollection without duplicates >> >being >> >>> >> returned. >> >>> >> >> >>> >> On Wed, Oct 26, 2016, 7:36 AM Jean-Baptiste Onofré >> ><[email protected]> >> >>> >> wrote: >> >>> >> >> >>> >> > Agree. It was more a transition proposal. >> >>> >> > >> >>> >> > Regards >> >>> >> > JB >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> > On Oct 26, 2016, 08:31, at 08:31, Robert Bradshaw >> >>> >> > <[email protected]> wrote: >> >>> >> > >On Mon, Oct 24, 2016 at 11:02 PM, Jean-Baptiste Onofré >> >>> >> > ><[email protected]> wrote: >> >>> >> > >> And what about use RemoveDuplicates and create an alias >> >Distinct >> >>> >? >> >>> >> > > >> >>> >> > >I'd really like to avoid (long term) aliases--you end up >> >having to >> >>> >> > >document (and maintain) them both, and it adds confusion as to >> >>> >which >> >>> >> > >one to use (especially if they every diverge), and means >> >searching >> >>> >for >> >>> >> > >one or the other yields half the results. >> >>> >> > > >> >>> >> > >> It doesn't break the API and would address both SQL users >> >and >> >>> >more >> >>> >> > >"big data" users. >> >>> >> > >> >> >>> >> > >> My $0.01 ;) >> >>> >> > >> >> >>> >> > >> Regards >> >>> >> > >> JB >> >>> >> > >> >> >>> >> > >> >> >>> >> > >> >> >>> >> > >> On Oct 24, 2016, 22:23, at 22:23, Dan Halperin >> >>> >> > ><[email protected]> wrote: >> >>> >> > >>>I find "MakeDistinct" more confusing. My votes in decreasing >> >>> >> > >>>preference: >> >>> >> > >>> >> >>> >> > >>>1. Keep `RemoveDuplicates` name, ensure that important >> >keywords >> >>> >are >> >>> >> > >in >> >>> >> > >>>the >> >>> >> > >>>Javadoc. This reduces churn on our users and is honestly >> >pretty >> >>> >dang >> >>> >> > >>> descriptive. >> >>> >> > >>>2. Rename to `Distinct`, which is clear if you're a SQL user >> >and >> >>> >> > >likely >> >>> >> > >>>less clear otherwise. This is a backwards-incompatible API >> >>> >change, so >> >>> >> > >>>we >> >>> >> > >>>should do it before we go stable. >> >>> >> > >>> >> >>> >> > >>>I am not super strong that 1 > 2, but I am very strong that >> >>> >> > >"Distinct" >> >>> >> > >>>>>> >> >>> >> > >>>"MakeDistinct" or and "RemoveDuplicates" >>> >> >"AvoidDuplicate". >> >>> >> > >>> >> >>> >> > >>>Dan >> >>> >> > >>> >> >>> >> > >>>On Mon, Oct 24, 2016 at 10:12 AM, Kenneth Knowles >> >>> >> > >>><[email protected]> >> >>> >> > >>>wrote: >> >>> >> > >>> >> >>> >> > >>>> The precedent that we use verbs has many exceptions. We >> >have >> >>> >> > >>>> ApproximateQuantiles, Values, Keys, WithTimestamps, and I >> >>> >would >> >>> >> > >even >> >>> >> > >>>> include Sum (at least when I read it). >> >>> >> > >>>> >> >>> >> > >>>> Historical note: the predilection towards verbs is from >> >the >> >>> >Google >> >>> >> > >>>Style >> >>> >> > >>>> Guide for Java method names >> >>> >> > >>>> >> >>> >> > >>><https://google.github.io/styleguide/javaguide.html#s5. >> >>> >> 2.3-method-names >> >>> >> > >, >> >>> >> > >>>> which states "Method names are typically verbs or verb >> >>> >phrases". >> >>> >> > >But >> >>> >> > >>>even >> >>> >> > >>>> in Google code there are lots of exceptions when it makes >> >>> >sense, >> >>> >> > >like >> >>> >> > >>>> Guava's >> >>> >> > >>>> Iterables.any(), Iterables.all(), Iterables.toArray(), the >> >>> >entire >> >>> >> > >>>> Predicates module, etc. Just an aside; Beam isn't Google >> >code. >> >>> >I >> >>> >> > >>>suggest we >> >>> >> > >>>> use our judgment rather than a policy. >> >>> >> > >>>> >> >>> >> > >>>> I think "Distinct" is one of those exceptions. It is a >> >>> >standard >> >>> >> > >>>widespread >> >>> >> > >>>> name and also reads better as an adjective. I prefer it, >> >but >> >>> >also >> >>> >> > >>>don't >> >>> >> > >>>> care strongly enough to change it or to change it back :-) >> >>> >> > >>>> >> >>> >> > >>>> If we must have a verb, I like it as-is more than >> >MakeDistinct >> >>> >and >> >>> >> > >>>> AvoidDuplicate. >> >>> >> > >>>> >> >>> >> > >>>> On Mon, Oct 24, 2016 at 9:46 AM Jesse Anderson >> >>> >> > >>><[email protected]> >> >>> >> > >>>> wrote: >> >>> >> > >>>> >> >>> >> > >>>> > My original thought for this change was that Crunch uses >> >the >> >>> >> > >class >> >>> >> > >>>name >> >>> >> > >>>> > Distinct. SQL also uses the keyword distinct. >> >>> >> > >>>> > >> >>> >> > >>>> > Maybe the rule should be changed to adjectives or verbs >> >>> >depending >> >>> >> > >>>on the >> >>> >> > >>>> > context. >> >>> >> > >>>> > >> >>> >> > >>>> > Using a verb to describe this class really doesn't >> >connote >> >>> >what >> >>> >> > >the >> >>> >> > >>>class >> >>> >> > >>>> > does as succinctly as the adjective. >> >>> >> > >>>> > >> >>> >> > >>>> > On Mon, Oct 24, 2016 at 9:40 AM Neelesh Salian >> >>> >> > >>><[email protected]> >> >>> >> > >>>> > wrote: >> >>> >> > >>>> > >> >>> >> > >>>> > > Hello, >> >>> >> > >>>> > > >> >>> >> > >>>> > > First of all, thank you to Daniel, Robert and Jesse >> >for >> >>> >their >> >>> >> > >>>review on >> >>> >> > >>>> > > this: https://issues.apache.org/jira/browse/BEAM-239 >> >>> >> > >>>> > > >> >>> >> > >>>> > > A point that came up was using verbs explicitly for >> >>> >Transforms. >> >>> >> > >>>> > > Here is the PR: >> >>> >> > >>>https://github.com/apache/incubator-beam/pull/1164 >> >>> >> > >>>> > > >> >>> >> > >>>> > > Posting it to help understand if we have a consensus >> >for >> >>> >it and >> >>> >> > >>>if yes, >> >>> >> > >>>> > we >> >>> >> > >>>> > > could perhaps document it for future changes. >> >>> >> > >>>> > > >> >>> >> > >>>> > > Thank you. >> >>> >> > >>>> > > >> >>> >> > >>>> > > -- >> >>> >> > >>>> > > Neelesh Srinivas Salian >> >>> >> > >>>> > > Engineer >> >>> >> > >>>> > > >> >>> >> > >>>> > >> >>> >> > >>>> >> >>> >> > >> >>> >> >> >>> >> > > > > -- > Neelesh Srinivas Salian > Customer Operations Engineer > > > > -- Neelesh Srinivas Salian Customer Operations Engineer
