Till Westmann has posted comments on this change. Change subject: Implements concurrent query management support. ......................................................................
Patch Set 23: (58 comments) Most comments are around 1) Using "capacity" instead of "resource" in some class (and variable) names. Changing those names seems to capture the semantics better (at least for me). It also avoid the the ambiguity between resource and resources. 2) It seems a bit unintuitive to have the controller of whether a a cluster can accept a job with the job and not with the resource manager. It would be nice to change this if it does make important functionality more complicated. 3) The result of JobManager.get seems to return the same value as before in some cases, but it others the behavior is changed. Need to validate if that is correct. https://asterix-gerrit.ics.uci.edu/#/c/1424/23/asterixdb/asterix-app/src/main/java/org/apache/asterix/api/common/APIFramework.java File asterixdb/asterix-app/src/main/java/org/apache/asterix/api/common/APIFramework.java: Line 317: spec.setJobResourceController( Would it be feasible to just give the requires resources (or a resource provider) to the job spec and to move the decision to a controller that gets the job as a parameter (cf. my comment IJobResourceController)? I think that it'd be more intuitive, to associate the required resources with the job and the control with the resource manager. https://asterix-gerrit.ics.uci.edu/#/c/1424/23/asterixdb/asterix-app/src/main/java/org/apache/asterix/app/resource/ComputationResourceVisitor.java File asterixdb/asterix-app/src/main/java/org/apache/asterix/app/resource/ComputationResourceVisitor.java: Line 71: public class ComputationResourceVisitor implements ILogicalOperatorVisitor<Void, Void> { RequiredCapacityVisitor? https://asterix-gerrit.ics.uci.edu/#/c/1424/23/asterixdb/asterix-app/src/main/java/org/apache/asterix/hyracks/bootstrap/NCApplicationEntryPoint.java File asterixdb/asterix-app/src/main/java/org/apache/asterix/hyracks/bootstrap/NCApplicationEntryPoint.java: Line 267: public NodeResource getAvailableResource() { getCapacity? Line 272: int allCores = Runtime.getRuntime().availableProcessors(); create variables for the 2 numbers and pass those to the constructor? Should be a similar number of lines https://asterix-gerrit.ics.uci.edu/#/c/1424/23/asterixdb/asterix-app/src/test/java/org/apache/asterix/app/resource/ComputationResourceVisitorTest.java File asterixdb/asterix-app/src/test/java/org/apache/asterix/app/resource/ComputationResourceVisitorTest.java: Line 43: public class ComputationResourceVisitorTest { RequiredCapacityVisitorTest? https://asterix-gerrit.ics.uci.edu/#/c/1424/23/asterixdb/asterix-installer/src/test/resources/integrationts/asterix-configuration.xml File asterixdb/asterix-installer/src/test/resources/integrationts/asterix-configuration.xml: Line 221: <value>8MB</value> Wasn't this pushed up just recently? https://asterix-gerrit.ics.uci.edu/#/c/1424/23/hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/application/INCApplicationEntryPoint.java File hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/application/INCApplicationEntryPoint.java: Line 30: NodeResource getAvailableResource(); getCapacity()? https://asterix-gerrit.ics.uci.edu/#/c/1424/23/hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/exceptions/ErrorCode.java File hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/exceptions/ErrorCode.java: Line 46: public static final int JOB_RESOURCE_BEYOND_MAXIMUM = 9; JOB_REQUIREMENTS_EXCEED_CAPACITY Line 47: public static final int NODE_NOT_EXIST = 10; NO_SUCH_NODE https://asterix-gerrit.ics.uci.edu/#/c/1424/23/hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/job/JobSpecification.java File hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/job/JobSpecification.java: Line 82: private IJobResourceController jobResourceConstroller; As mentioned elsewhere: could this be just the require capacity without the controller? https://asterix-gerrit.ics.uci.edu/#/c/1424/23/hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/job/resource/ComputationResource.java File hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/job/resource/ComputationResource.java: Line 29: public class ComputationResource implements IComputationResource { ClusterCapacity? Line 84: if (nodeResource == null) { Is there ever a valid reason why this should be null? It seems that we should establish a rule around errors that indicate a bug. Either we use the Java errors (NPE, IllegalArgument, IllegalState) or we introduce equivalent error codes for those or we introduce a single INTERNAL_ERROR code or ...? (maybe not in this change). https://asterix-gerrit.ics.uci.edu/#/c/1424/23/hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/job/resource/IComputationResource.java File hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/job/resource/IComputationResource.java: Line 27: public interface IComputationResource extends IReadOnlyComputationResource { I think that it would be good to call these IClusterCapacity and IReadOnlyClusterCapacity. The reasons are that a) this is about the cluster information that's aggregated from the node information and b) it really is about the maximum (or currently available) capacity of the cluster to accept jobs. Line 43: void setAggregatedAvailableCores(int aggregatedAvailableCores); If we move to the "capacity" naming scheme, I think that we don't need "available" anymore as that's implicit. https://asterix-gerrit.ics.uci.edu/#/c/1424/23/hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/job/resource/IJobResourceController.java File hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/job/resource/IJobResourceController.java: Line 31: */ I'm not sure I understand the interfaces here. It seems that a resource controller is associated with a job and not with the cluster (who owns the resources). It seems more intuitive to have something that knows about the available resources, gets a job (and the job would know about the required resources) and returns a decision (REJECT, ALLOCATE, QUEUE). Does this make sense? Or is there a reason why this would be worse? https://asterix-gerrit.ics.uci.edu/#/c/1424/23/hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/job/resource/IReadOnlyComputationResource.java File hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/job/resource/IReadOnlyComputationResource.java: Line 27: public interface IReadOnlyComputationResource { As mentioned before, I think that IReadOnlyClusterCapacity would be better. Line 37: int getAggregatedAvailableCores(); If we move to the "capacity" naming scheme, I think that we don't need "available" anymore as that's implicit. Line 60: int getAvailableCores(String nodeId) throws HyracksException; If we move to the "capacity" naming scheme, I think that we don't need "available" anymore as that's implicit. https://asterix-gerrit.ics.uci.edu/#/c/1424/23/hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/job/resource/NodeResource.java File hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/job/resource/NodeResource.java: Line 27: public class NodeResource implements Serializable { NodeCapacity? Line 30: private final long availableMemoryByteSize; memoryByteSize? Line 34: private final int availableCores; cores? Line 36: public NodeResource(long memorySize, int availableCores) { What does it mean, if these parameters are negative? Could we document this case? https://asterix-gerrit.ics.uci.edu/#/c/1424/23/hyracks-fullstack/hyracks/hyracks-api/src/main/resources/errormsg/en.properties File hyracks-fullstack/hyracks/hyracks-api/src/main/resources/errormsg/en.properties: Line 30: 9=Job requires %1$s, which is more than the maximum available resource %2$s Job requirement %1$s exceeds capacity %2$s https://asterix-gerrit.ics.uci.edu/#/c/1424/23/hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/NodeControllerState.java File hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/NodeControllerState.java: Line 145: private NodeResource resource; s/resource/capacity/ Line 208: resource = reg.getResource(); s/resource/capacity/ Line 277: public NodeResource getResource() { s/resource/capacity/ https://asterix-gerrit.ics.uci.edu/#/c/1424/23/hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/cluster/INodeManager.java File hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/cluster/INodeManager.java: Line 43: remove empty line? Line 94: void addNode(String nodeId, NodeControllerState ncState) throws HyracksException; cluster the methods by "applied to one node" and "applies to all nodes"? https://asterix-gerrit.ics.uci.edu/#/c/1424/23/hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/cluster/NodeManager.java File hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/cluster/NodeManager.java: Line 62: return new HashMap<>(ipAddressNodeNameMap); Should we use Collections.unmodifiableMap() here to document the intent? Line 67: return new TreeSet<>(nodeRegistry.keySet()); Should we use Collections.unmodifiableMap() here to document the intent? Line 72: return new HashSet<>(nodeRegistry.values()); Should we use Collections.unmodifiableMap() here to document the intent? https://asterix-gerrit.ics.uci.edu/#/c/1424/23/hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/executor/ActivityClusterPlanner.java File hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/executor/ActivityClusterPlanner.java: Line 58: private final JobExecutor scheduler; rename variable? Line 62: ActivityClusterPlanner(JobExecutor newJobScheduler) { rename variable? https://asterix-gerrit.ics.uci.edu/#/c/1424/23/hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/job/IJobManager.java File hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/job/IJobManager.java: Line 48: * This method gets called when all slave processes have notified the master that their individual parallel Can we call them "worker" instead of "slave"? Line 63: * each individual parallel partition. s/method should dictated all involved slave processes to cleanup the states of each individual parallel partition/method should instruct all involved worker processes to clean the state of each individual parallel partition up/ move those into the usual order of events: add, prepare, final? Line 91: Collection<JobRun> getAllRunningJobs(); remove "All" from the 3 method names? Line 110: List<Exception> getRunHistory(JobId jobId); Move this up to the "single job" methods? https://asterix-gerrit.ics.uci.edu/#/c/1424/23/hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/job/JobManager.java File hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/job/JobManager.java: Line 77: throw new HyracksException(e); Error code? https://asterix-gerrit.ics.uci.edu/#/c/1424/23/hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/scheduler/IJobQueue.java File hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/scheduler/IJobQueue.java: Line 54: List<JobRun> remove(IReadOnlyComputationResource maxResource, IComputationResource availableResource); This interface is also a little confusing. We add a JobRun to the queue, but we remove by giving resources back. It seems the ResouceManager should be the one that manages the returning of resources. Or maybe we just need a different name for this method? https://asterix-gerrit.ics.uci.edu/#/c/1424/23/hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/scheduler/IResourceManager.java File hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/scheduler/IResourceManager.java: Line 36: IReadOnlyComputationResource getMaximumResource(); s/getMaximumResource/getMaximumCapacity/ Line 41: IComputationResource getCurrentResource(); s/getCurrentResource/getCurrentCapacity/ https://asterix-gerrit.ics.uci.edu/#/c/1424/23/hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/scheduler/ResourceManager.java File hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/scheduler/ResourceManager.java: Line 32: private IComputationResource maxResource = new ComputationResource(); s/Resource/Capacity/ Line 35: private IComputationResource currentResource = new ComputationResource(); s/Resource/Capacity/ Line 38: public IReadOnlyComputationResource getMaximumResource() { s/Resource/Capacity/ Line 43: public IComputationResource getCurrentResource() { s/Resource/Capacity/ https://asterix-gerrit.ics.uci.edu/#/c/1424/23/hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/work/AbstractTaskLifecycleWork.java File hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/work/AbstractTaskLifecycleWork.java: Line 55: JobRun run = jobManager.get(jobId); this might return an archived job https://asterix-gerrit.ics.uci.edu/#/c/1424/23/hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/work/GetActivityClusterGraphJSONWork.java File hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/work/GetActivityClusterGraphJSONWork.java: Line 44: JobRun run = jobManager.get(jobId); this might return an archived job https://asterix-gerrit.ics.uci.edu/#/c/1424/23/hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/work/GetJobRunJSONWork.java File hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/work/GetJobRunJSONWork.java: Line 42: JobRun run = jobManager.get(jobId); this might return an archived job https://asterix-gerrit.ics.uci.edu/#/c/1424/23/hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/work/GetJobStatusWork.java File hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/work/GetJobStatusWork.java: Line 43: JobRun run = jobManager.get(jobId); here the implicit deferral of JobManager.get to archived jobs looks right https://asterix-gerrit.ics.uci.edu/#/c/1424/23/hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/work/JobCleanupWork.java File hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/work/JobCleanupWork.java: Line 57: JobRun run = jobManager.get(jobId); this might return an archived job https://asterix-gerrit.ics.uci.edu/#/c/1424/23/hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/work/JobletCleanupNotificationWork.java File hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/work/JobletCleanupNotificationWork.java: Line 51: final JobRun run = jobManager.get(jobId); this might return an archived job https://asterix-gerrit.ics.uci.edu/#/c/1424/23/hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/work/RegisterPartitionAvailibilityWork.java File hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/work/RegisterPartitionAvailibilityWork.java: Line 47: JobRun run = jobManager.get(pid.getJobId()); this might return an archived job https://asterix-gerrit.ics.uci.edu/#/c/1424/23/hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/work/RegisterPartitionRequestWork.java File hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/work/RegisterPartitionRequestWork.java: Line 45: JobRun run = jobManager.get(pid.getJobId()); this might return an archived job https://asterix-gerrit.ics.uci.edu/#/c/1424/23/hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/work/ReportProfilesWork.java File hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/work/ReportProfilesWork.java: Line 42: JobRun run = jobManager.get(profile.getJobId()); this might also return an archived job https://asterix-gerrit.ics.uci.edu/#/c/1424/23/hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/work/TaskCompleteWork.java File hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/work/TaskCompleteWork.java: Line 47: JobRun run = jobManager.get(jobId); this might also return an archived job https://asterix-gerrit.ics.uci.edu/#/c/1424/23/hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/work/TaskFailureWork.java File hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/work/TaskFailureWork.java: Line 42: JobRun run = jobManager.get(jobId); This might also return an archived job. https://asterix-gerrit.ics.uci.edu/#/c/1424/23/hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/work/WaitForJobCompletionWork.java File hyracks-fullstack/hyracks/hyracks-control/hyracks-control-cc/src/main/java/org/apache/hyracks/control/cc/work/WaitForJobCompletionWork.java: Line 58: final IJobStatusConditionVariable cArchivedVar = jobManager.get(jobId); This seems wrong. If the first call to jobManager.get(jobId) returned null, why would this 2nd one return something else? Also, the implementation of JabManager.get suggests that we get an archived job, if we didn't find an active one. https://asterix-gerrit.ics.uci.edu/#/c/1424/23/hyracks-fullstack/hyracks/hyracks-examples/btree-example/btreehelper/src/main/java/org/apache/hyracks/examples/btree/helper/NCApplicationEntryPoint.java File hyracks-fullstack/hyracks/hyracks-examples/btree-example/btreehelper/src/main/java/org/apache/hyracks/examples/btree/helper/NCApplicationEntryPoint.java: Line 44: public NodeResource getAvailableResource() { getCapacity? -- To view, visit https://asterix-gerrit.ics.uci.edu/1424 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8fb6fda57efa139114dd234e08cc7de7129468c8 Gerrit-PatchSet: 23 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Yingyi Bu <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Michael Blow <[email protected]> Gerrit-Reviewer: Till Westmann <[email protected]> Gerrit-Reviewer: abdullah alamoudi <[email protected]> Gerrit-HasComments: Yes
