Hi, thank you Stephan!
Here's the missing part of the plan: http://i.imgur.com/N861tg1.png There is one hash partition / sort. Is this what you're talking about? Regarding your second point, how can I test if the data is known to be partitioned at the end? -Vasia. On 15 July 2015 at 13:13, Stephan Ewen <se...@apache.org> wrote: > Hey Vasia! > > Sorry for the late response... Thanks for pinging again! > > The optimizer is acting a little funky here - seems an artifact of the > "properties" optimization. > > -> The initial join needs to be partitioned and sorted. Can you check > whether one partitioning and sorting happens before the iteration? That > part is cut off in the screenshot sou sent. It must be either on the input > of the iteration, of the output. > > -> The iteration needs to make sure it leaves the data partitioned and > sorted. There is a "re-sorting" operator at the end ("Rebuild Workset > Properties"), but it does not partition. The test should make sure the data > is known to be partitioned at the very end of the iteration (after the > "Rebuild Workset Properties" operator). This is probably true, if the join > has some forward field annotation. > > We can have a quick skype chat later, if you have more questions... > > Greetings, > Stephan > > > > On Wed, Jul 15, 2015 at 12:08 PM, Vasiliki Kalavri < > vasilikikala...@gmail.com> wrote: > > > Hey, > > > > any input on this? or a hint? or where to look to figure this out by > > myself? > > > > Thanks! > > -Vasia. > > > > On 7 July 2015 at 15:20, Vasiliki Kalavri <vasilikikala...@gmail.com> > > wrote: > > > > > Hello to my squirrels, > > > > > > I've started looking into FLINK-1943 > > > <https://issues.apache.org/jira/browse/FLINK-1943> and I need some > help > > > to understand what to test and how to do it properly. > > > > > > In the corresponding Spargel compiler test, the following functionality > > is > > > checked: > > > > > > 1. sink: the ship strategy is FORWARD and the parallelism is correct > > > 2. iteration: degree of parallelism > > > 3. solution set join: parallelism and input1 ship strategy is > > > PARTITION_HASH > > > 4. workset join: parallelism, input1 (edges) ship strategy is > > > PARTITION_HASH and cached, input2 (workset) ship strategy is FORWARD > > > 5. check that the initial partitioning is pushed out of the loop > > > 6. check that the initial workset sort is outside the loop > > > > > > I have been able to verify 1-4 of the above for the GSA iteration plan, > > > but I'm not sure how to check (5) and (6) or whether they are expected > to > > > hold in the GSA case. > > > > > > In [1] you can see what the GSA iteration operators looks like and in > [2] > > > you can see what the visualizer tools generates the GSA connected > > > components. > > > > > > Any pointers would be greatly appreciated! > > > > > > Cheers, > > > Vasia. > > > > > > [1]: > > > > > > https://docs.google.com/drawings/d/1tiNQeOphWtkNXTGlnDJ3Ipanh0Tm2R8sHe8XNyTnf98/edit?usp=sharing > > > [2]: http://imgur.com/GQZ48ZI > > > > > >