Hi, We get back to 'distributed join' feature. I resurrected branch 'ignite-1232', did brief review of changes, added more tests and analysed some failures. Here are issues identified so far, *Sergi*, could you please comment:
- it seems current code does not properly handle case when custom AffinityKeyMapper is set. As I understand when custom AffinityKeyMapper is used, then it is not possible to reason about affinity key field? - incorrect result when join partitioned and replicated caches. Schema is: Account -> Person -> Organization. Person is REPLICATED cache. Plan for this join query 'from Organization o, Person p, Account a where p.orgId = o._key and p._key = a.personId': SELECT O.NAME AS __C0, P._KEY AS __C1, P.NAME AS __C2, A.NAME AS __C3 FROM "org".ORGANIZATIONO /* "org".ORGANIZATION.__SCAN_ */ INNER JOIN "person".PERSON P /* *batched:broadcast* "person".PERSON.__SCAN_ */ ON 1=1 /* WHERE P.ORGID = O._KEY */ INNER JOIN "acc".ACCOUNT A /* batched:broadcast "acc".ACCOUNT.__SCAN_ */ ON 1=1 WHERE (P.ORGID = O._KEY) AND (P._KEY = A.PERSONID) Look like here distributed join is erroneously used for 'ORGANIZATION/PERSON' join, regual local join can be used since PERSON in replicated cache. As I understand DistributedLookupBatch should never be used to get data from index on REPLICATED cache? I added minimal test to reproduce this: IgniteCacheJoinPartitionedAndReplicatedTest. - I noticed that a lot of data can be transferred when join condition does not use index. For example there are Person[ogrName] and Organization[name] caches (there are no indexes on orgName, name), try join 'Organization o, Person p where o.name=p.name'.If on some node there are 5 Organizations then DistributedLookupBatch.addSearchRows(SearchRow firstRow, SearchRow lastRow) will be called 5 times with parameters firstRow=null, lastRow=null (which means 'get all rows from index'). As result GridH2IndexRangeRequest created by this DistributedLookupBatch will contain 5 identical GridH2RowRangeBounds[first=null, last=null] and related GridH2IndexRangeResponse will contain 5 identical GridH2RowRanges containing all index rows. It seems this should be somehow optimized? I added test with such model and query - IgniteCacheJoinNoIndexTest (test pass, but can be used to debug query requests/response). - another observation related to DistributedLookupBatch.addSearchRows(SearchRow firstRow, SearchRow lastRow): it is possible that this method will be called multiple times with the same parameters (for example when 'join Person, Organization where p.ordId=o.id', then 'addSearchRows' can be called for the same 'orgId' multiple times). Does it make sense to try optimize this? - our regular sql queries benchmarks show performance drop ~6%, I'll try to investigate Thanks!