JIRA ticket to track stream joins can be found at https://issues.apache.org/jira/browse/CALCITE-968.
Thanks Milinda On Sat, Nov 14, 2015 at 4:54 PM, Milinda Pathirage <[email protected]> wrote: > Hi Julian, > > Thanks for the response. Will create a jira ticket and come up with some > samples. > > > Milinda > > On Sat, Nov 14, 2015 at 3:38 AM, Julian Hyde <[email protected]> wrote: > >> Short answer: yes, we should allow it. >> >> The design falls into 3 parts: >> * Validation. We should allow any combination: table-table, stream-table >> and stream-stream joins, as long as the query can make progress. That often >> means that where a stream is involved, the join condition should involve a >> monotonic expression. If it is a stream-table join you can make progress >> without the monotonic expression, but if there are 2 streams you will need >> it. >> * Translation to relational algebra. Inspired by differential calculus’ >> product rule[1], "stream(x join y)" becomes "x join stream(y) union all >> stream(x) join y". Suppose that products is a table (i.e. we do not receive >> notifications of new products); then "stream(products)" is empty. Suppose >> that orders is a both a stream and a table; i.e. a stream with history. >> Because stream(products) is empty, "stream(products join orders)" is simply >> “products join stream(orders)”. These rewrites would happen in a >> DeltaJoinTransposeRule. >> * Updates to relations. Suppose that the products table is updated two or >> three times during each day. How quickly does the end user expect those >> updated records to appear in the output of the stream-table join? If the >> table is updated at 10am, should the new data be loaded only when >> processing transactions from 10am (which might not hit the join until say >> 10:07am). There is no ‘right answer’ here; we should offer the end user a >> choice of policies. A good basic policy would be “cache for no more than T >> seconds” or “cache as long as you like” but give a manual way to flush the >> cache. >> >> Can you please log a jira case to track this? Next step would be to write >> some sample queries and decide whether they are valid. >> >> Julian >> >> [1] https://en.wikipedia.org/wiki/Product_rule >> >> > On Nov 13, 2015, at 9:35 PM, Milinda Pathirage <[email protected]> >> wrote: >> > >> > Hi devs, >> > >> > Current SqlValidatorImpl doesn't allow queries like following: >> > >> > select stream orders.orderId, orders.productId, products.name from >> > orders join products on orders.productId = products.id >> > >> > >> > if the 'products' is a relation. This query fails at the modality check. >> > But I am not sure whether fixing (or changing) the modality checking >> logic >> > is enough to solve this. Do we need to change planner rules as well. >> Really >> > appreciate any ideas on this. >> > >> > Thanks >> > Milinda >> > >> > p.s. I am trying to get this base case working where every element from >> a >> > stream is joined with a relation. stream-to-stream joins requires >> changes >> > to parser as well to support windowing. That's my understanding, Julian >> may >> > have better ideas. >> > >> > -- >> > Milinda Pathirage >> > >> > PhD Student | Research Assistant >> > School of Informatics and Computing | Data to Insight Center >> > Indiana University >> > >> > twitter: milindalakmal >> > skype: milinda.pathirage >> > blog: http://milinda.pathirage.org >> >> > > > -- > Milinda Pathirage > > PhD Student | Research Assistant > School of Informatics and Computing | Data to Insight Center > Indiana University > > twitter: milindalakmal > skype: milinda.pathirage > blog: http://milinda.pathirage.org > -- Milinda Pathirage PhD Student | Research Assistant School of Informatics and Computing | Data to Insight Center Indiana University twitter: milindalakmal skype: milinda.pathirage blog: http://milinda.pathirage.org
