Created: https://issues.apache.org/jira/browse/APEXMALHAR-2488
~ Bhupesh _______________________________________________________ Bhupesh Chawda E: bhup...@datatorrent.com | Twitter: @bhupeshsc www.datatorrent.com | apex.apache.org On Tue, May 9, 2017 at 8:47 PM, Bhupesh Chawda <bhup...@datatorrent.com> wrote: > Looks like it would be okay to remove Join Impl 1 from Malhar. > The windowed merge implementation can be worked on and simplified to > address simpler use cases and ease of use. > > Before proceeding with this, would be good to hear what other community > members think. > Will proceed with creating the JIRAs and PR if there is no response in a > couple of days. > > ~ Bhupesh > > > > _______________________________________________________ > > Bhupesh Chawda > > E: bhup...@datatorrent.com | Twitter: @bhupeshsc > > www.datatorrent.com | apex.apache.org > > > > On Sat, May 6, 2017 at 11:07 PM, Thomas Weise <t...@apache.org> wrote: > >> --> >> >> On Wed, May 3, 2017 at 2:59 AM, Bhupesh Chawda <bhup...@datatorrent.com> >> wrote: >> >> > The main difference is in the implementations of managed state that are >> > used in the two join impls. >> > The advantage mainly comes from the fact that Join impl 1 uses >> > ManagedTimeStateImpl (key buckets + time buckets) while Join impl 2 is >> > based on the other two implementations (both with the notion of either a >> > key or a time bucket). >> > >> >> How does it affect performance and scalability? I think that's the key >> question it comes down to. >> >> >> >> > >> > I agree that the windowed version addresses a more generic usecase. My >> only >> > concern was are there use cases / user communities which are not >> familiar >> > with the windowed semantics and might prefer the other implementation >> > instead? Would that warrant keeping the other implementation around? >> > >> >> It should be possible to create a module or wrapper if the intention is to >> simplify a specific use case? >> >> >> > >> > >> > >> > >> > On Fri, Apr 28, 2017 at 10:09 AM, Thomas Weise <t...@apache.org> wrote: >> > >> > > There is one more important difference not mentioned: >> > > >> > > Join Impl 1 doesn't work and Join Impl 2 does :) >> > > >> > > Can you clarify why a (working) Join Impl 1 would perform better? And >> if >> > it >> > > is the case, how the amount of work fixing 1 would stack up against >> > > improving 2? >> > > >> > > Join Impl 2 has greater flexibility due to the generalized windowing. >> If >> > > everything else is same I prefer we put our efforts there. >> > > >> > > Thanks, >> > > Thomas >> > > >> > > >> > > >> > > On Wed, Apr 26, 2017 at 11:14 PM, Bhupesh Chawda <bhup...@apache.org> >> > > wrote: >> > > >> > > > Hi Community, >> > > > >> > > > Currently the support for join in Malhar is little fuzzy for the end >> > > user. >> > > > We have multiple implementations - >> > > > >> > > > 1. Join Impl 1 - Inner Join implementation, based on Managed >> state >> > > > 2. Join Impl 2 - Merge operator, Windowed implementation, based >> on >> > > > Spillable structures (based on managed state) >> > > > >> > > > Following are the differences between the two: >> > > > >> > > > - As the name implies, Join Impl 1 is meant for inner joins, >> while >> > > Join >> > > > Impl 2 has generic support for inner as well as outer joins. >> > > > - Join Impl 1 supports sliding time windows with support for >> > expiring >> > > > old tuples. Join Impl 2 needs understanding of windowing concepts >> > and >> > > > uses >> > > > watermarking support for functioning. >> > > > - By looking at the implementations of managed state used by Join >> > > Impl 1 >> > > > and Join Impl 2, it seems like Join Impl 1 would have a >> performance >> > > > advantage over Join Impl 2. >> > > > >> > > > The purpose of this email is to see what can be done to simplify the >> > join >> > > > usability in Malhar. Following are some options: >> > > > >> > > > 1. Keep both implementations with clear documentation of the >> > usability >> > > > for both. >> > > > 2. Remove Join Impl 1 from Malhar and work with Join Impl 2 to >> > improve >> > > > performance. Note that even though Join Impl 1 addresses a very >> > > specific >> > > > use case, it is the most common requirement in streaming join use >> > > cases. >> > > > 3. Any other option? >> > > > >> > > > Thanks. >> > > > >> > > > ~ Bhupesh >> > > > >> > > > >> > > > >> > > >> > >> > >