Thanks everyone for all the inputs.
It's really encouraging for a new contributor, as myself, to get valuable
input and mentoring (like on this thread) and, in turn, help make the
community better.



On Thu, Oct 27, 2016 at 11:41 AM, Jean-Baptiste Onofré <[email protected]>
wrote:

> You did well ! It's an interesting discussion we have and it's great to
> have it on the mailing list (better than in Jira or PR comments IMHO).
>
> Thanks !
>
> Regards
> JB
>
> ⁣​
>
> On Oct 27, 2016, 20:39, at 20:39, Robert Bradshaw
> <[email protected]> wrote:
> >+1 to all Dan says.
> >
> >I only brought this up because it seemed new contributors (yay)
> >jumping in and renaming a core transform based on "Something to
> >consider" deserved a couple more more eyeballs, but didn't intend for
> >it to become a big deal.
> >
> >On Thu, Oct 27, 2016 at 11:03 AM, Dan Halperin
> ><[email protected]> wrote:
> >> Folks, I don't think this needs to be a "vote". This is just not that
> >big a
> >> deal :). It is important to be transparent and have these discussions
> >on
> >> the list, which is why we brought it here from GitHub/JIRA, but at
> >the end
> >> of the day I hope that a small group of committers and developers can
> >> assess "good enough" consensus for these minor issues.
> >>
> >> Here's my assessment:
> >> * We don't really have any rules about naming transforms. "Should be
> >a
> >> verb" is a sort of guiding principle inherited from the Google Flume
> >> project from which Dataflow evolved, but honestly we violate this
> >rule for
> >> clarity all over the place. ("Values", for example).
> >> * The "Big Data" community is significantly more familiar with the
> >concept
> >> of Distinct -- Jesse, who filed the original JIRA, is a good example
> >here.
> >> * Finally, nobody feels very strongly. We could argue minor points of
> >each
> >> solution, but at the end of the day I don't think anyone wants to
> >block a
> >> change.
> >>
> >> Let's go with Distinct. It's important to align Beam with the open
> >source
> >> big data community. (And thanks Jesse, our newest (*tied) committer,
> >for
> >> pushing us in the right direction!)
> >>
> >> Jesse, can you please take charge of wrapping up the PR and merging
> >it?
> >>
> >> Thanks!
> >> Dan
> >>
> >> On Wed, Oct 26, 2016 at 11:12 PM, Jean-Baptiste Onofré
> ><[email protected]>
> >> wrote:
> >>
> >>> Just to clarify. Davor is right for a code modification change: -1
> >means a
> >>> veto.
> >>> I meant that -1 is not a veto for a release vote.
> >>>
> >>> Anyway, even if it's not a formal code, we can have a discussion
> >with
> >>> "options" a,b and c.
> >>>
> >>> Regards
> >>> JB
> >>>
> >>> ⁣
> >>>
> >>> On Oct 27, 2016, 06:48, at 06:48, Davor Bonaci
> ><[email protected]>
> >>> wrote:
> >>> >In terms of reaching a decision on any code or design changes,
> >>> >including
> >>> >this one, I'd suggest going without formal votes. Voting process
> >for
> >>> >code
> >>> >modifications between choices A and B doesn't necessarily end with
> >a
> >>> >decision A or B -- a single (qualified) -1 vote is a veto and
> >cannot be
> >>> >overridden [1]. Said differently, the guideline is that code
> >changes
> >>> >should
> >>> >be made by consensus; not by one group outvoting another. I'd like
> >to
> >>> >avoid
> >>> >setting such precedent; we should try to drive consensus, as
> >opposed to
> >>> >attempting to outvote another part of the community.
> >>> >
> >>> >In this particular case, we have had a great discussion. Many
> >>> >contributors
> >>> >brought different perspectives. Consequently, some opinions have
> >been
> >>> >likely changed. At this point, someone should summarize the
> >arguments,
> >>> >try
> >>> >to critique them from a neutral standpoint, and suggest a refined
> >>> >proposal
> >>> >that takes these perspectives into account. If nobody objects in a
> >>> >short
> >>> >time, we should consider this decided. [ I can certainly help here,
> >but
> >>> >I'd
> >>> >love to see somebody else do it! ]
> >>> >
> >>> >[1] http://www.apache.org/foundation/voting.html
> >>> >
> >>> >On Wed, Oct 26, 2016 at 7:35 AM, Ben Chambers
> >>> ><[email protected]>
> >>> >wrote:
> >>> >
> >>> >> I also like Distinct since it doesn't make it sound like it
> >modifies
> >>> >any
> >>> >> underlying collection. RemoveDuplicates makes it sound like the
> >>> >duplicates
> >>> >> are removed, rather than a new PCollection without duplicates
> >being
> >>> >> returned.
> >>> >>
> >>> >> On Wed, Oct 26, 2016, 7:36 AM Jean-Baptiste Onofré
> ><[email protected]>
> >>> >> wrote:
> >>> >>
> >>> >> > Agree. It was more a transition proposal.
> >>> >> >
> >>> >> > Regards
> >>> >> > JB
> >>> >> >
> >>> >> > ⁣
> >>> >> >
> >>> >> > On Oct 26, 2016, 08:31, at 08:31, Robert Bradshaw
> >>> >> > <[email protected]> wrote:
> >>> >> > >On Mon, Oct 24, 2016 at 11:02 PM, Jean-Baptiste Onofré
> >>> >> > ><[email protected]> wrote:
> >>> >> > >> And what about use RemoveDuplicates and create an alias
> >Distinct
> >>> >?
> >>> >> > >
> >>> >> > >I'd really like to avoid (long term) aliases--you end up
> >having to
> >>> >> > >document (and maintain) them both, and it adds confusion as to
> >>> >which
> >>> >> > >one to use (especially if they every diverge), and means
> >searching
> >>> >for
> >>> >> > >one or the other yields half the results.
> >>> >> > >
> >>> >> > >> It doesn't break the API and would address both SQL users
> >and
> >>> >more
> >>> >> > >"big data" users.
> >>> >> > >>
> >>> >> > >> My $0.01 ;)
> >>> >> > >>
> >>> >> > >> Regards
> >>> >> > >> JB
> >>> >> > >>
> >>> >> > >> ⁣
> >>> >> > >>
> >>> >> > >> On Oct 24, 2016, 22:23, at 22:23, Dan Halperin
> >>> >> > ><[email protected]> wrote:
> >>> >> > >>>I find "MakeDistinct" more confusing. My votes in decreasing
> >>> >> > >>>preference:
> >>> >> > >>>
> >>> >> > >>>1. Keep `RemoveDuplicates` name, ensure that important
> >keywords
> >>> >are
> >>> >> > >in
> >>> >> > >>>the
> >>> >> > >>>Javadoc. This reduces churn on our users and is honestly
> >pretty
> >>> >dang
> >>> >> > >>> descriptive.
> >>> >> > >>>2. Rename to `Distinct`, which is clear if you're a SQL user
> >and
> >>> >> > >likely
> >>> >> > >>>less clear otherwise. This is a backwards-incompatible API
> >>> >change, so
> >>> >> > >>>we
> >>> >> > >>>should do it before we go stable.
> >>> >> > >>>
> >>> >> > >>>I am not super strong that 1 > 2, but I am very strong that
> >>> >> > >"Distinct"
> >>> >> > >>>>>>
> >>> >> > >>>"MakeDistinct" or and "RemoveDuplicates" >>>
> >"AvoidDuplicate".
> >>> >> > >>>
> >>> >> > >>>Dan
> >>> >> > >>>
> >>> >> > >>>On Mon, Oct 24, 2016 at 10:12 AM, Kenneth Knowles
> >>> >> > >>><[email protected]>
> >>> >> > >>>wrote:
> >>> >> > >>>
> >>> >> > >>>> The precedent that we use verbs has many exceptions. We
> >have
> >>> >> > >>>> ApproximateQuantiles, Values, Keys, WithTimestamps, and I
> >>> >would
> >>> >> > >even
> >>> >> > >>>> include Sum (at least when I read it).
> >>> >> > >>>>
> >>> >> > >>>> Historical note: the predilection towards verbs is from
> >the
> >>> >Google
> >>> >> > >>>Style
> >>> >> > >>>> Guide for Java method names
> >>> >> > >>>>
> >>> >> > >>><https://google.github.io/styleguide/javaguide.html#s5.
> >>> >> 2.3-method-names
> >>> >> > >,
> >>> >> > >>>> which states "Method names are typically verbs or verb
> >>> >phrases".
> >>> >> > >But
> >>> >> > >>>even
> >>> >> > >>>> in Google code there are lots of exceptions when it makes
> >>> >sense,
> >>> >> > >like
> >>> >> > >>>> Guava's
> >>> >> > >>>> Iterables.any(), Iterables.all(), Iterables.toArray(), the
> >>> >entire
> >>> >> > >>>> Predicates module, etc. Just an aside; Beam isn't Google
> >code.
> >>> >I
> >>> >> > >>>suggest we
> >>> >> > >>>> use our judgment rather than a policy.
> >>> >> > >>>>
> >>> >> > >>>> I think "Distinct" is one of those exceptions. It is a
> >>> >standard
> >>> >> > >>>widespread
> >>> >> > >>>> name and also reads better as an adjective. I prefer it,
> >but
> >>> >also
> >>> >> > >>>don't
> >>> >> > >>>> care strongly enough to change it or to change it back :-)
> >>> >> > >>>>
> >>> >> > >>>> If we must have a verb, I like it as-is more than
> >MakeDistinct
> >>> >and
> >>> >> > >>>> AvoidDuplicate.
> >>> >> > >>>>
> >>> >> > >>>> On Mon, Oct 24, 2016 at 9:46 AM Jesse Anderson
> >>> >> > >>><[email protected]>
> >>> >> > >>>> wrote:
> >>> >> > >>>>
> >>> >> > >>>> > My original thought for this change was that Crunch uses
> >the
> >>> >> > >class
> >>> >> > >>>name
> >>> >> > >>>> > Distinct. SQL also uses the keyword distinct.
> >>> >> > >>>> >
> >>> >> > >>>> > Maybe the rule should be changed to adjectives or verbs
> >>> >depending
> >>> >> > >>>on the
> >>> >> > >>>> > context.
> >>> >> > >>>> >
> >>> >> > >>>> > Using a verb to describe this class really doesn't
> >connote
> >>> >what
> >>> >> > >the
> >>> >> > >>>class
> >>> >> > >>>> > does as succinctly as the adjective.
> >>> >> > >>>> >
> >>> >> > >>>> > On Mon, Oct 24, 2016 at 9:40 AM Neelesh Salian
> >>> >> > >>><[email protected]>
> >>> >> > >>>> > wrote:
> >>> >> > >>>> >
> >>> >> > >>>> > > Hello,
> >>> >> > >>>> > >
> >>> >> > >>>> > > First of all, thank you to Daniel, Robert and Jesse
> >for
> >>> >their
> >>> >> > >>>review on
> >>> >> > >>>> > > this: https://issues.apache.org/jira/browse/BEAM-239
> >>> >> > >>>> > >
> >>> >> > >>>> > > A point that came up was using verbs explicitly for
> >>> >Transforms.
> >>> >> > >>>> > > Here is the PR:
> >>> >> > >>>https://github.com/apache/incubator-beam/pull/1164
> >>> >> > >>>> > >
> >>> >> > >>>> > > Posting it to help understand if we have a consensus
> >for
> >>> >it and
> >>> >> > >>>if yes,
> >>> >> > >>>> > we
> >>> >> > >>>> > > could perhaps document it for future changes.
> >>> >> > >>>> > >
> >>> >> > >>>> > > Thank you.
> >>> >> > >>>> > >
> >>> >> > >>>> > > --
> >>> >> > >>>> > > Neelesh Srinivas Salian
> >>> >> > >>>> > > Engineer
> >>> >> > >>>> > >
> >>> >> > >>>> >
> >>> >> > >>>>
> >>> >> >
> >>> >>
> >>>
>



-- 
Neelesh Srinivas Salian
Customer Operations Engineer

Reply via email to