kristoffSC commented on PR #21393: URL: https://github.com/apache/flink/pull/21393#issuecomment-1396174097
Hi @tsreaper Thank you for your valuable feedback, no need to apologize :) Please see my response below. > In BlockStatementSplitter you group all single statements between IF/ELSE/WHILEs together and extract each group into a separate method. I'm not only extracting single line statements but also entire if/else calls. Other than that your understanding is correct. > (If my guesses are correct, you can implement this with just one for loop and a recursive call. No need for mergeBlocks or addGroups or something else. mergeBlocks and addGroups confuse me quick a lot.) I think I'm already doing this. There is a for loop for block statements. Every block has its own visitor. There is no recursion on calling visit method. The only recursion is to traverse through all visitors and get blocks from them. This is straightforward parent -> child structure. I'm not merging any blocks, I'm simply gathering all blocks from all visitors. Please take a look at my comment -> https://github.com/apache/flink/pull/21393#discussion_r1071174072 Also please note that `addGroups` method was deleted. > The problem of this idea (or rather, the current implementation) is that, it's doing almost the same thing with the old IfStatementRewriter, plus the currently remaining FunctionSplitter. I don't agree with that statement. Firstly my implementation handle `else if` blocks which original statement did not. Also in some cases, my implementation produced better results than original `IfStatementRewriter`. In other cases - the result was the same as original `IfStatementRewriter`, So we have extra value here. Please look at more examples provided by me at the end of this comment. Also I'm optimizing and grouping IF/ELSE/WHILE branches bodies (block, statements) whereas `FunctionSplitter` assumes that statements is already optimized. That means that `FunctionSplitter` takesentire IF/Else without any extra modifications and adds it to the methods body or extracts it to the new method. So I'm not duplicating `FunctionSplitter`, those operate on different levels. > I guess what you need is just to copy a bit of code in IfStatementRewriter, change them into WHILE and that's all. Maybe changing the name to BlockStatementRewriter. This shall achieve our goals by making the least changes. I do not agree with this statement. `IfStatementRewriter` is specialized in IF/ELSE. It does not process "else if". Originally I was thinking about adding bits of code to `IfStatementRewriter` but I realized that the problem is more complex. We need something that can process nested combination of WHILE's and IF's I couldn't have things like `shouldExtract(blockStatementContext.statement().statement(0))` and `shouldExtract(blockStatementContext.statement().statement(1))` I wanted to have more generic code, that will apply the same logic to all blocks. In fact, one may argue that I actually did ` copy a bit of code in IfStatementRewriter change them into WHILE and that's all. ` You will find bits of `IfStatementRewriter` in "BlockStatementRewriter::rewrite` and "BlockStatementVisitor::visitMethodDeclaration` methods. Other than that, I proposed new solution. Also `IfStatementRewriter` approach for handling nested IF/ELSE is different that in my implementation. If I understand correctly, `IfStatementRewriter` rewrite entire class as many times as there were extracted methods. ``` while (visitor.hasRewrite()) { visitor = new IfStatementVisitor(rewriterCode); rewriterCode = visitor.rewriteAndGetCode(); } ``` The first rewrite call will extract first method, `visitor.hasRewrite()` will return true, we will rewrite again, but this time we will rewrite the previously extracted method, and so on. Every time, the entire class has to be "parsed". In my case we parse class once, extracting new blocks and processing nested expressions by creating new visitors <- NO recursion here. Every visitor has its context, that will become a name prefix of extracted by this visitors. Every extracted block Every new visitor has the same reference to "rewriter" object. This means that every visitor handles only small portion of entire method, and rewrites only its part. I do understand that this change touches core component, so the possible impact is big -> SQL jobs. I can tell that existing integration/end 2 end tests detected issues with previous implementations. I managed to solve those by analyzing original and rewritten code created for test queries. To sum up: 1. recursion -> I really don't think that this should be a problem. Recursion is called only to gather the results, parsing is not using recursion. 2. addBlocks -> removed 3. adding bits to `IfStatementRewriter` I dont think this is the option. I can try to implement your way of processing nested expressions, something similar to `while (visitor.hasRewrite())` from `IfStatementRewriter` if you think that can help pushing this PR. At the end please take a look at some examples: Example 1: Original code: ``` public void myFun1(int[] a, int[] b) throws RuntimeException { if (a[0] == 0) { a[0] = 1; a[1] = 1; } else if (a[1] = 22) { a[1] = b[12]; a[2] = b[22]; } else if (a[3] == 0) { a[3] = b[3]; a[33] = b[33]; } else if (a[4] = 0) { a[4] = b[4]; a[44] = b[44]; } else { a[0] = b[0]; a[1] = b[1]; a[2] = b[2]; } } ``` BlockStatementRewriter result: ``` public void myFun1(int[] a, int[] b) throws RuntimeException { if (a[0] == 0) { myFun1_ifBody0(a, b); } else if (a[1] = 22) { myFun1_ifBody1_ifBody0(a, b); } else if (a[3] == 0) { myFun1_ifBody1_ifBody1_ifBody0(a, b); } else if (a[4] = 0) { myFun1_ifBody1_ifBody1_ifBody1_ifBody0(a, b); } else { myFun1_ifBody1_ifBody1_ifBody1_ifBody1(a, b); } } void myFun1_ifBody1_ifBody1_ifBody0(int[] a, int[] b) throws RuntimeException { a[3] = b[3]; a[33] = b[33]; } void myFun1_ifBody1_ifBody0(int[] a, int[] b) throws RuntimeException { a[1] = b[12]; a[2] = b[22]; } void myFun1_ifBody1_ifBody1_ifBody1_ifBody1(int[] a, int[] b) throws RuntimeException { a[0] = b[0]; a[1] = b[1]; a[2] = b[2]; } void myFun1_ifBody1_ifBody1_ifBody1_ifBody0(int[] a, int[] b) throws RuntimeException { a[4] = b[4]; a[44] = b[44]; } void myFun1_ifBody0(int[] a, int[] b) throws RuntimeException { a[0] = 1; a[1] = 1; } ``` IfStatementRewriter result: ``` public void myFun1(int[] a, int[] b) throws RuntimeException { if (a[0] == 0) { myFun1_trueFilter1(a, b); } else if (a[1] = 22) { a[1] = b[12]; a[2] = b[22]; } else if (a[3] == 0) { a[3] = b[3]; a[33] = b[33]; } else if (a[4] = 0) { a[4] = b[4]; a[44] = b[44]; } else { a[0] = b[0]; a[1] = b[1]; a[2] = b[2]; } } void myFun1_trueFilter1(int[] a, int[] b) throws RuntimeException{ a[0] = 1; a[1] = 1; } ``` Example 2: Original code: ``` public void myFun1(int[] a, int[] b) throws RuntimeException { if (a[0] == 0) { System.out.println("0"); System.out.println("0"); if (a[1] == 0) { System.out.println("1"); System.out.println("2"); if (a[2] == 0) { a[2] = 1; a[22] = 1; } else { a[2] = b[2]; a[22] = b[2]; } } else { a[1] = b[1]; a[2] = b[2]; } } else { System.out.println("3"); System.out.println("3"); if (a[1] == 1) { a[0] = b[0]; a[1] = b[1]; a[2] = b[2]; } else { a[0] = 2 * b[0]; a[1] = 2 * b[1]; a[2] = 2 * b[2]; } } } ``` BlockStatementRewriter result: ``` public void myFun1(int[] a, int[] b) throws RuntimeException { if (a[0] == 0) { myFun1_rewriteGroup1(a, b); } else { myFun1_rewriteGroup3(a, b); } } void myFun1_ifBody0_0_ifBody0_0_ifBody1(int[] a, int[] b) throws RuntimeException { a[2] = b[2]; a[22] = b[2]; } void myFun1_ifBody0_0_ifBody0_0_ifBody0(int[] a, int[] b) throws RuntimeException { a[2] = 1; a[22] = 1; } void myFun1_ifBody0_0_ifBody1(int[] a, int[] b) throws RuntimeException { a[1] = b[1]; a[2] = b[2]; } void myFun1_rewriteGroup1(int[] a, int[] b) throws RuntimeException { myFun1_ifBody0_0(a, b); if (a[1] == 0) { myFun1_rewriteGroup0_rewriteGroup1(a, b); } else { myFun1_ifBody0_0_ifBody1(a, b); } } void myFun1_ifBody1_0(int[] a, int[] b) throws RuntimeException { System.out.println("3"); System.out.println("3"); } void myFun1_ifBody0_0(int[] a, int[] b) throws RuntimeException { System.out.println("0"); System.out.println("0"); } void myFun1_ifBody0_0_ifBody0_0(int[] a, int[] b) throws RuntimeException { System.out.println("1"); System.out.println("2"); } void myFun1_rewriteGroup3(int[] a, int[] b) throws RuntimeException { myFun1_ifBody1_0(a, b); if (a[1] == 1) { myFun1_ifBody1_0_ifBody0(a, b); } else { myFun1_ifBody1_0_ifBody1(a, b); } } void myFun1_ifBody1_0_ifBody0(int[] a, int[] b) throws RuntimeException { a[0] = b[0]; a[1] = b[1]; a[2] = b[2]; } void myFun1_ifBody1_0_ifBody1(int[] a, int[] b) throws RuntimeException { a[0] = 2 * b[0]; a[1] = 2 * b[1]; a[2] = 2 * b[2]; } void myFun1_rewriteGroup0_rewriteGroup1(int[] a, int[] b) throws RuntimeException { myFun1_ifBody0_0_ifBody0_0(a, b); if (a[2] == 0) { myFun1_ifBody0_0_ifBody0_0_ifBody0(a, b); } else { myFun1_ifBody0_0_ifBody0_0_ifBody1(a, b); } } ``` IfStatementRewriter result: ``` public void myFun1(int[] a, int[] b) throws RuntimeException { if (a[0] == 0) { myFun1_trueFilter1(a, b); } else { myFun1_falseFilter2(a, b); } } void myFun1_trueFilter1(int[] a, int[] b) throws RuntimeException{ System.out.println("0"); System.out.println("0"); if (a[1] == 0) { myFun1_trueFilter1_trueFilter3(a, b); } else { myFun1_trueFilter1_falseFilter4(a, b); } } void myFun1_trueFilter1_trueFilter3(int[] a, int[] b) throws RuntimeException{ System.out.println("1"); System.out.println("2"); if (a[2] == 0) { myFun1_trueFilter1_trueFilter3_trueFilter7(a, b); } else { myFun1_trueFilter1_trueFilter3_falseFilter8(a, b); } } void myFun1_trueFilter1_trueFilter3_trueFilter7(int[] a, int[] b) throws RuntimeException{ a[2] = 1; a[22] = 1; } void myFun1_trueFilter1_trueFilter3_falseFilter8(int[] a, int[] b) throws RuntimeException{ a[2] = b[2]; a[22] = b[2]; } void myFun1_trueFilter1_falseFilter4(int[] a, int[] b) throws RuntimeException{ a[1] = b[1]; a[2] = b[2]; } void myFun1_falseFilter2(int[] a, int[] b) throws RuntimeException{ System.out.println("3"); System.out.println("3"); if (a[1] == 1) { myFun1_falseFilter2_trueFilter5(a, b); } else { myFun1_falseFilter2_falseFilter6(a, b); } } void myFun1_falseFilter2_trueFilter5(int[] a, int[] b) throws RuntimeException{ a[0] = b[0]; a[1] = b[1]; a[2] = b[2]; } void myFun1_falseFilter2_falseFilter6(int[] a, int[] b) throws RuntimeException{ a[0] = 2 * b[0]; a[1] = 2 * b[1]; a[2] = 2 * b[2]; } ``` Cheers, And happy New Year ;) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
