Hi Community, Currently the support for join in Malhar is little fuzzy for the end user. We have multiple implementations -
1. Join Impl 1 - Inner Join implementation, based on Managed state 2. Join Impl 2 - Merge operator, Windowed implementation, based on Spillable structures (based on managed state) Following are the differences between the two: - As the name implies, Join Impl 1 is meant for inner joins, while Join Impl 2 has generic support for inner as well as outer joins. - Join Impl 1 supports sliding time windows with support for expiring old tuples. Join Impl 2 needs understanding of windowing concepts and uses watermarking support for functioning. - By looking at the implementations of managed state used by Join Impl 1 and Join Impl 2, it seems like Join Impl 1 would have a performance advantage over Join Impl 2. The purpose of this email is to see what can be done to simplify the join usability in Malhar. Following are some options: 1. Keep both implementations with clear documentation of the usability for both. 2. Remove Join Impl 1 from Malhar and work with Join Impl 2 to improve performance. Note that even though Join Impl 1 addresses a very specific use case, it is the most common requirement in streaming join use cases. 3. Any other option? Thanks. ~ Bhupesh