Created: https://issues.apache.org/jira/browse/APEXMALHAR-2488

~ Bhupesh


_______________________________________________________

Bhupesh Chawda

E: bhup...@datatorrent.com | Twitter: @bhupeshsc

www.datatorrent.com  |  apex.apache.org



On Tue, May 9, 2017 at 8:47 PM, Bhupesh Chawda <bhup...@datatorrent.com>
wrote:

> ​Looks like it would be okay to remove Join Impl 1 from Malhar.
> The windowed merge implementation can be worked on and simplified to
> address simpler use cases and ease of use.
>
> Before proceeding with this, would be good to hear what other community
> members think.
> Will proceed with creating the JIRAs and PR if there is no response in a
> couple of days.
>
> ~ Bhupesh
> ​
>
>
> _______________________________________________________
>
> Bhupesh Chawda
>
> E: bhup...@datatorrent.com | Twitter: @bhupeshsc
>
> www.datatorrent.com  |  apex.apache.org
>
>
>
> On Sat, May 6, 2017 at 11:07 PM, Thomas Weise <t...@apache.org> wrote:
>
>> -->
>>
>> On Wed, May 3, 2017 at 2:59 AM, Bhupesh Chawda <bhup...@datatorrent.com>
>> wrote:
>>
>> > The main difference is in the implementations of managed state that are
>> > used in the two join impls.
>> > The advantage mainly comes from the fact that Join impl 1 uses
>> > ManagedTimeStateImpl (key buckets + time buckets) while Join impl 2 is
>> > based on the other two implementations (both with the notion of either a
>> > key or a time bucket).
>> >
>>
>> How does it affect performance and scalability? I think that's the key
>> question it comes down to.
>>
>>
>>
>> >
>> > I agree that the windowed version addresses a more generic usecase. My
>> only
>> > concern was are there use cases / user communities which are not
>> familiar
>> > with the windowed semantics and might prefer the other implementation
>> > instead? Would that warrant keeping the other implementation around?
>> >
>>
>> It should be possible to create a module or wrapper if the intention is to
>> simplify a specific use case?
>>
>>
>> >
>> >
>> >
>> >
>> > On Fri, Apr 28, 2017 at 10:09 AM, Thomas Weise <t...@apache.org> wrote:
>> >
>> > > There is one more important difference not mentioned:
>> > >
>> > > Join Impl 1 doesn't work and Join Impl 2 does :)
>> > >
>> > > Can you clarify why a (working) Join Impl 1 would perform better? And
>> if
>> > it
>> > > is the case, how the amount of work fixing 1 would stack up against
>> > > improving 2?
>> > >
>> > > Join Impl 2 has greater flexibility due to the generalized windowing.
>> If
>> > > everything else is same I prefer we put our efforts there.
>> > >
>> > > Thanks,
>> > > Thomas
>> > >
>> > >
>> > >
>> > > On Wed, Apr 26, 2017 at 11:14 PM, Bhupesh Chawda <bhup...@apache.org>
>> > > wrote:
>> > >
>> > > > Hi Community,
>> > > >
>> > > > Currently the support for join in Malhar is little fuzzy for the end
>> > > user.
>> > > > We have multiple implementations -
>> > > >
>> > > >    1. Join Impl 1 - Inner Join implementation, based on Managed
>> state
>> > > >    2. Join Impl 2 - Merge operator, Windowed implementation, based
>> on
>> > > >    Spillable structures (based on managed state)
>> > > >
>> > > > Following are the differences between the two:
>> > > >
>> > > >    - As the name implies, Join Impl 1 is meant for inner joins,
>> while
>> > > Join
>> > > >    Impl 2 has generic support for inner as well as outer joins.
>> > > >    - Join Impl 1 supports sliding time windows with support for
>> > expiring
>> > > >    old tuples. Join Impl 2 needs understanding of windowing concepts
>> > and
>> > > > uses
>> > > >    watermarking support for functioning.
>> > > >    - By looking at the implementations of managed state used by Join
>> > > Impl 1
>> > > >    and Join Impl 2, it seems like Join Impl 1 would have a
>> performance
>> > > >    advantage over Join Impl 2.
>> > > >
>> > > > The purpose of this email is to see what can be done to simplify the
>> > join
>> > > > usability in Malhar. Following are some options:
>> > > >
>> > > >    1. Keep both implementations with clear documentation of the
>> > usability
>> > > >    for both.
>> > > >    2. Remove Join Impl 1 from Malhar and work with Join Impl 2 to
>> > improve
>> > > >    performance. Note that even though Join Impl 1 addresses a very
>> > > specific
>> > > >    use case, it is the most common requirement in streaming join use
>> > > cases.
>> > > >    3. Any other option?
>> > > >
>> > > > Thanks.
>> > > >
>> > > > ~ Bhupesh
>> > > >
>> > > > ​​
>> > > >
>> > >
>> >
>>
>
>

Reply via email to