@Fabian: I hope that this is the complete list, correct me f I am wrong. :)

I am opening a small PR with the changes on top of Aljoscha's one that
exposes the streaming partitioning then.

On Mon, Jun 1, 2015 at 6:01 PM, Stephan Ewen <se...@apache.org> wrote:

> +1
>
> Good list and choices, Marton!
>
> On Mon, Jun 1, 2015 at 5:45 PM, Fabian Hueske <fhue...@gmail.com> wrote:
>
> > Thanks for bringing up this point!
> >
> > +1 for the renaming.
> > @Marton: Is this a "complete" list, i.e., did you go through both APIs or
> > might there be more methods that are semantically identical but named
> > differently?
> >
> > 2015-06-01 17:31 GMT+02:00 Gyula Fóra <gyf...@apache.org>:
> >
> > > +1 for the changes proposed by Marton (before the release)
> > >
> > > Aljoscha Krettek <aljos...@apache.org> ezt írta (időpont: 2015. jún.
> 1.,
> > > H,
> > > 16:32):
> > >
> > > > Yes, these renamings make sense. The partitionBy() is not yet in the
> > > > master for streaming, though.
> > > >
> > > > On Mon, Jun 1, 2015 at 4:10 PM, Márton Balassi <
> > balassi.mar...@gmail.com
> > > >
> > > > wrote:
> > > > > Looking at the DataSet and DataStream APIs we have come to the
> > > conclusion
> > > > > with Aljoscha that there are a few methods that although providing
> > the
> > > > same
> > > > > functionality are named differently. These are the following:
> > > > >
> > > > >    1.  rebalance (batch) / distribute (streaming): Rebalances the
> > data
> > > > sent
> > > > >    to the downstream operators thus equally distributing it.
> > > > >    2. partitionByHash, partitionCustom (batch) / partitionBy
> > > (streaming):
> > > > >    Partitioning has just recently been exposed in the streaming API
> > and
> > > > is not
> > > > >    as refined as the batch one. The streaming partitionBy is
> actually
> > > > >    partitionByHash.
> > > > >    3. Union (batch) / merge, connect (streaming): The streaming
> merge
> > > > does
> > > > >    a union of two streams with the same type. Connect is
> conceptually
> > > > >    different, it provides a way of sharing state between two
> streams
> > > with
> > > > >    potentially different types without mapping them to a common
> type
> > > and
> > > > then
> > > > >    merging them. This saves latency and an ugly mapping. The former
> > > > advantage
> > > > >    can be offset by proper operator chaining, the second one would
> > > > remain if
> > > > >    we did not have connect.
> > > > >
> > > > > To consolidate the naming I would suggest the following:
> > > > >
> > > > >    1. Rename streaming distribute to rebalance.
> > > > >    2. Rename streaming partitionBy to partitionByHash and file JIRA
> > for
> > > > >    custom partitioning support for streaming.
> > > > >    3. Rename streaming merge to union, leave streaming connect as
> it
> > > is.
> > > >
> > >
> >
>

Reply via email to