Xikui Wang has posted comments on this change. Change subject: [ASTERIXDB-2152][FUN][COMP] Enable specifying computation location ......................................................................
Patch Set 12: (2 comments) Added two comments. One of them obviously exceeded the reviewer friendly comment size limit. Sorry about that. :) https://asterix-gerrit.ics.uci.edu/#/c/2114/12/asterixdb/asterix-common/src/main/resources/asx_errormsg/en.properties File asterixdb/asterix-common/src/main/resources/asx_errormsg/en.properties: PS12, Line 121: Invalid computation location > Yes, but it might be nice to report the invalid location if one is invalid. Oh. I misunderstood your question. I thought you were asking about possibility here. Will address this in next patch. https://asterix-gerrit.ics.uci.edu/#/c/2114/12/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/AssignPOperator.java File hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/AssignPOperator.java: PS12, Line 118: setLocationConstraint > But I'm wondering why a location constraint is always needed for an assign Alright. I spent some time investigating the constraints. Let me see if I can convince you. :) Here we talk about the UDF in feed case only, as we don't do anything special for udf evaluation for common queries currently. 1. The partition constraint here is slightly different than the locationConstraint in dataset Ops which is tied to physical properties. The location constraint here depends on the computation locations (i.e., partitions) and it's decided dynamically during the query compilation. The user specified parallelism level, which is similar to the countConstraint, is translated to locationConstraints with computation location assigned in a round robin fashion. 2. We could also only assign count constraint and let hyracks decide which node to run at runtime. However, for the current implementation, the node assignment is random which cannot distribute the workload evenly. ps. there is a bug in the random assignments also, and I submitted another patch for it. 3. One possibility is to do round robin in the node assignment for start task. However, hyracks treats all tasks equally.We can't really do round robin for the udf evaluation tasks only. In that sense, I guess assign location constraint here probably better. 4. Currently, the locationConstraint for assign is only set in the feed context. The feed datasource obtains computation nodes list and we use that as the count constraint for udf evaluation. My feeling is we have the full workload distribution information, but we ignore the detailed answer and cross our fingers to hope hyracks give us an answer.... 5. Further, if we have advanced load balance implemented in hyracks, this should go away for sure. :) -- To view, visit https://asterix-gerrit.ics.uci.edu/2114 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id7eed5dac03c2f260507e16cf687162d65787bd1 Gerrit-PatchSet: 12 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Xikui Wang <[email protected]> Gerrit-Reviewer: Anon. E. Moose #1000171 Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Till Westmann <[email protected]> Gerrit-Reviewer: Xikui Wang <[email protected]> Gerrit-HasComments: Yes
