Lady Kalamari, The plan looks good.
To test whether the data is partitioned there: If you have the optimizer plan, make sure the global properties have a partitioning property of "PATITIONED_HASH". Thanks, Stephan On Wed, Jul 15, 2015 at 2:07 PM, Vasiliki Kalavri <vasilikikala...@gmail.com > wrote: > Hi, > > thank you Stephan! > > Here's the missing part of the plan: http://i.imgur.com/N861tg1.png > There is one hash partition / sort. Is this what you're talking about? > > Regarding your second point, how can I test if the data is known to be > partitioned at the end? > > > -Vasia. > > On 15 July 2015 at 13:13, Stephan Ewen <se...@apache.org> wrote: > > > Hey Vasia! > > > > Sorry for the late response... Thanks for pinging again! > > > > The optimizer is acting a little funky here - seems an artifact of the > > "properties" optimization. > > > > -> The initial join needs to be partitioned and sorted. Can you check > > whether one partitioning and sorting happens before the iteration? That > > part is cut off in the screenshot sou sent. It must be either on the > input > > of the iteration, of the output. > > > > -> The iteration needs to make sure it leaves the data partitioned and > > sorted. There is a "re-sorting" operator at the end ("Rebuild Workset > > Properties"), but it does not partition. The test should make sure the > data > > is known to be partitioned at the very end of the iteration (after the > > "Rebuild Workset Properties" operator). This is probably true, if the > join > > has some forward field annotation. > > > > We can have a quick skype chat later, if you have more questions... > > > > Greetings, > > Stephan > > > > > > > > On Wed, Jul 15, 2015 at 12:08 PM, Vasiliki Kalavri < > > vasilikikala...@gmail.com> wrote: > > > > > Hey, > > > > > > any input on this? or a hint? or where to look to figure this out by > > > myself? > > > > > > Thanks! > > > -Vasia. > > > > > > On 7 July 2015 at 15:20, Vasiliki Kalavri <vasilikikala...@gmail.com> > > > wrote: > > > > > > > Hello to my squirrels, > > > > > > > > I've started looking into FLINK-1943 > > > > <https://issues.apache.org/jira/browse/FLINK-1943> and I need some > > help > > > > to understand what to test and how to do it properly. > > > > > > > > In the corresponding Spargel compiler test, the following > functionality > > > is > > > > checked: > > > > > > > > 1. sink: the ship strategy is FORWARD and the parallelism is correct > > > > 2. iteration: degree of parallelism > > > > 3. solution set join: parallelism and input1 ship strategy is > > > > PARTITION_HASH > > > > 4. workset join: parallelism, input1 (edges) ship strategy is > > > > PARTITION_HASH and cached, input2 (workset) ship strategy is FORWARD > > > > 5. check that the initial partitioning is pushed out of the loop > > > > 6. check that the initial workset sort is outside the loop > > > > > > > > I have been able to verify 1-4 of the above for the GSA iteration > plan, > > > > but I'm not sure how to check (5) and (6) or whether they are > expected > > to > > > > hold in the GSA case. > > > > > > > > In [1] you can see what the GSA iteration operators looks like and in > > [2] > > > > you can see what the visualizer tools generates the GSA connected > > > > components. > > > > > > > > Any pointers would be greatly appreciated! > > > > > > > > Cheers, > > > > Vasia. > > > > > > > > [1]: > > > > > > > > > > https://docs.google.com/drawings/d/1tiNQeOphWtkNXTGlnDJ3Ipanh0Tm2R8sHe8XNyTnf98/edit?usp=sharing > > > > [2]: http://imgur.com/GQZ48ZI > > > > > > > > > >