Xikui Wang has posted comments on this change.

Change subject: [HYR,RT,CLUS] Fixes and Improvements for predistributed Jobs
......................................................................


Patch Set 6:

(7 comments)

I left some comments. One general question, it seems "JobId" is being replaced 
by "PredistributedId" in many places. I TOTALLY agree that the "JobId" at query 
level (may not exactly be query level, but  I think you get the idea) should be 
decoupled from the runtime JobId, especially for the predistributed job case 
where one p-job may have multiple invocations running at the same time. My 
questions/concerns are 1. If we are replacing JobId here, should we call this 
new Id "predistributedId" as it doesn't only affect p-job but all jobs? 2. we'd 
have to make sure this works for query cancellation/recovery/active runtime 
etc.. Anyways, this patch needs someone with better eyes than my broken ones... 
;)

https://asterix-gerrit.ics.uci.edu/#/c/2045/6//COMMIT_MSG
Commit Message:

Line 12: Allow predistributed jobs to have a new jobId for each execution
I didn't see you touch the transcation Id in this patch. When I did my fix, I 
had some issues with that to make pre-distributed job running as expected. You 
may want to double check on that... a test case may help. ;)


https://asterix-gerrit.ics.uci.edu/#/c/2045/6/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/ConstantFoldingRule.java
File 
asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/ConstantFoldingRule.java:

Line 192:             if 
(expr.getFunctionIdentifier().equals(BuiltinFunctions.IS_MISSING)) {
> How come the constant folding rule did not take care of is_missing() alread
+1. I don't really see the reason why this special case is needed. Could you 
explain more?


https://asterix-gerrit.ics.uci.edu/#/c/2045/6/hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/job/ActivityClusterGraph.java
File 
hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/job/ActivityClusterGraph.java:

Line 35: 
This seems to be not needed.


https://asterix-gerrit.ics.uci.edu/#/c/2045/6/hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/ClientInterfaceIPCI.java
File 
hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/ClientInterfaceIPCI.java:

Line 56:     private long predistributedId = 0;
I don't think to have a counter and do auto-increment here is a good idea. It 
would be nice to have something similar to JobIdFactory.


https://asterix-gerrit.ics.uci.edu/#/c/2045/6/hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/work/DestroyJobWork.java
File 
hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/work/DestroyJobWork.java:

Line 44:                 node.getNodeController().destroyJob(predestributedId);
Are we certain that this wouldn't cause problems for query cancellation/active 
runtimes/recovery? I don't know. Probably someone as knowledgeable as @Abdullah 
should have a look at this...


https://asterix-gerrit.ics.uci.edu/#/c/2045/6/hyracks-fullstack/hyracks/hyracks-control/hyracks-control-common/src/main/java/org/apache/hyracks/control/common/ipc/CCNCFunctions.java
File 
hyracks-fullstack/hyracks/hyracks-control/hyracks-control-common/src/main/java/org/apache/hyracks/control/common/ipc/CCNCFunctions.java:

Line 834:             for (int i = 0; i < runTimeVarsSize; i++) {
->runtimeVars. I think runtime is one word.


https://asterix-gerrit.ics.uci.edu/#/c/2045/6/hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/integration/PredistributedJobsTest.java
File 
hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/integration/PredistributedJobsTest.java:

Line 49: public class PredistributedJobsTest {
It would be nice to have a test case which can be a very simple query like 
"upsert into ds ({id:8, val: (select val from ds where ds.id = 8) + 1});" and 
invoke the predistributed job 100 times at the SAME time. To make sure there 
are multiple instances of pjob running at the same, this query can be more 
time-consuming. My experience of constructing my version of fix tells me that 
this would be a very good test case....


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/2045
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8f493c1fa977d07dfe8a875f9ebe9515d01d1473
Gerrit-PatchSet: 6
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Steven Jacobs <[email protected]>
Gerrit-Reviewer: Dmitry Lychagin <[email protected]>
Gerrit-Reviewer: Ildar Absalyamov <[email protected]>
Gerrit-Reviewer: Jenkins <[email protected]>
Gerrit-Reviewer: Steven Jacobs <[email protected]>
Gerrit-Reviewer: Till Westmann <[email protected]>
Gerrit-Reviewer: Xikui Wang <[email protected]>
Gerrit-Reviewer: abdullah alamoudi <[email protected]>
Gerrit-HasComments: Yes

Reply via email to