Marcel Kornacker has posted comments on this change. Change subject: IMPALA-2979: Fix scheduling on remote hosts ......................................................................
Patch Set 24: (49 comments) http://gerrit.cloudera.org:8080/#/c/2200/2/be/src/scheduling/simple-scheduler.cc File be/src/scheduling/simple-scheduler.cc: Line 149: Status status = HostnameToIpAddr(backends[i].hostname, &ip); > The underlying call to getaddrinfo() might return multiple IP addresses to what would be in that jira? http://gerrit.cloudera.org:8080/#/c/2200/19/be/src/scheduling/simple-scheduler.cc File be/src/scheduling/simple-scheduler.cc: Line 431: > I changed it. it started out with &&/|| on the following line, and then some people gradually drifted away from that. i often find leaving && at the end of the preceding line harder to read, esp. if you mix && and || and have parentheses, because it obscures the logical structure. instead of trying to shoehorn everything into ?:, you can also use an if. leaving the variable unassigned is fine from my perspective, because the declaration immediately precedes the assignment. http://gerrit.cloudera.org:8080/#/c/2200/24/be/src/scheduling/simple-scheduler.cc File be/src/scheduling/simple-scheduler.cc: Line 360: return backend_maps; does this create another copy? Line 439: AssignmentCtx assignment_ctx; this contains the assignment heap. so this isn't load-balanced across the query? (should it be?) Line 440: BackendCtx backend_ctx(AtomicReadBackendMapsPtr()); can we get the backend config once per query? Line 451: TNetworkAddress backend_hostport; this looks like a write-only variable (except you also read it in l518, but there you could simply pass in backend.address) Line 453: backend_hostport = MakeNetworkAddress(FLAGS_hostname, FLAGS_be_port); where is this recorded? Line 508: // optimal OS buffer cache utilization. i'd say "there's no OS buffer cache to worry about" Line 519: scan_range_locations, backend_ctx, &assignment_ctx, &byte_counters, assignment); since assignment_ctx records the per-backend assignment state, do the counters really need to go into a separate struct? Line 525: } // End of backend host selection. you're not using backend_hostport here Line 534: RecordScanRangeAssignment( backend_hostport, node_id, host_list, formatting Line 542: if (VLOG_FILE_IS_ON) { move this into a member function of the relevant struct (this function is already long enough as it is) Line 565: void SimpleScheduler::RecordScanRangeAssignment( const TNetworkAddress& backend_hostport, formatting Line 568: AssignmentCtx* assignment_ctx, ByteCounters* byte_counters, backend_ctx, assignment_ctx, and byte_counters always show up together. is there a concept here (that a single class should encapsulate)? should this entire function be a member function of that class? Line 596: for (const TScanRangeLocation& location: scan_range_locations.locations) { why are we iterating over these again? it feels like we already selected a particular replica Line 1058: // TODO: Do we have rules how to format and indent lambdas? maybe like a function? Line 1060: return backend_ctx.GetBackendRank(data_locations[a]) which would require extra indentation here and the closing brace on a separate line http://gerrit.cloudera.org:8080/#/c/2200/2/be/src/scheduling/simple-scheduler.h File be/src/scheduling/simple-scheduler.h: Line 263: /// reservation. The map is used to cancel queries whose reservation has been preempted. > They can be either. I renamed them to describe what they mean. I also simpl whether the implementation needs to do something with the strings shouldn't be reflected in the signature. the implementation should make copies if that's the case; but if you pass it by value and later on the implementation will not need to modify the params anymore, you'll most likely still pass them by value (because the reason for doing so will have been forgotten). http://gerrit.cloudera.org:8080/#/c/2200/24/be/src/scheduling/simple-scheduler.h File be/src/scheduling/simple-scheduler.h: Line 93: /// Map from a backend's IP address to a list of backends running on that node. a host's ip address? Line 96: /// Map from a backend's IP address to the next position in its corresponding list of a host's ip address? also, i would phrase that as "from ... to the next backend to be round-robin scheduled for that host (needed for setups with multiple backends on a single host)". Line 100: /// Map from a backend's hostname to its IP address to support hostname based backend why "backend's hostname" instead of just "hostname"? Line 107: struct BackendMaps { this is the semi-static configuration that is external to the scheduling process itself, so maybe something like BackendConfig or related? there are a lot of maps in this class (and lots of indirection), and it's helpful to remind the reader what can be updated in the course of a scheduling operation Line 114: /// TODO: Better name for BackendAssignmentInfo? AssignmentInfoElement? what's wrong with BackendAssignmentInfo? looks like that's exactly what it is. (just AssignmentInfo/-Element is less specific; not sure the -Element adds anything) Line 117: struct BackendAssignmentInfo { add comments for fields (who sets it, invariants) Line 123: return std::tie(assigned_bytes, random_rank) > i had to look this up. i think it's easier to read when converted to comparison logic that doesn't use ::tie() (and you only need a single line more) Line 128: /// Heap to compute candidates for scan range assignments. By default boost implements you never spell out how exactly that heap is organized (i know what it is, but it's still good to spell it out) Line 138: /// Struct to store backend information in an addressable heap. add information important to the understanding of this data structure: where is it used, who creates it, is this shared between threads? same for the next one. re naming: a log stores information chronologically, does this class do that? "this stores backend information" is needlessly vague. what kind of information? when you name something 'ctx' i assume it provides context for some specific operation or data structure. is this simply the per-backend assignment state? Line 148: AssignmentHeap backend_heap_; confusing to name the type and the variable differently Line 153: /// TODO: Better name for BackendCtx? what does this provide context for? Line 157: class BackendCtx { what is the logical division between this struct and AssignmentCtx? is there one? Line 161: bool HasUnusedBackends() const; comment Line 181: const BackendMap& backend_map() const { return backend_maps_->backend_map; } too many similar sounding names also make the code harder to follow Line 194: /// Store a random permutation of backend hosts to select backends from. provide class comment that gives context: why random permutations? etc. right now i can kind of guess what these are for, but that's only because of the conversations we've had. Line 201: /// A struct to track various counts of assigned bytes during scheduling. AssignedBytesCounters then? provides more information than ByteCounters Line 202: /// TODO: Move this into AssignmentCtx? good question, but definitely worth answering Line 211: /// and atomically swap the contents of this pointer. provide an *outline* of the main data structures and the scheduling process in the class comment (just enough to give the reader context which is useful for comprehending the details of the class declaration). right now you kind of have to piece it together from the various member comments. Line 216: mutable boost::mutex backend_maps_lock_; leave todo, maybe in class comment, to switch to kudu's rw locks, this is a very heavily frequented mutex Line 288: /// protecting the access by backend_maps_lock_. "access with" Line 292: BackendMapsPtr AtomicReadBackendMapsPtr() const; or simply Get-/SetBackendMapsPtr()? (or Get-/SetBackendConfig?) Line 333: /// of scan ranges. i don't understand the meaning of the last sentence. Line 339: /// processed first. If multiple replicas also run a backend, then the 'memory distance' what does "if multiple replicas also run a backend" mean? Line 340: /// of each backend is considered. The concept of memory distance reflects the cost of good to point out: memory distance should roughly reflect the effort involved of moving the data into the processing backend's main memory Line 349: /// the amount of work each backend has been assigned so far. you could also paraphrase the last sentence as "they will be load-balanced by assigned bytes across all backends" Line 353: /// replica_preference: This value is used as a minimum memory distance for all feel free to insert break after ':' and then use the full line width, to reduce blank space Line 364: /// schedule_random_replica: When equivalent backends are found for a scan range (same is this called node_random_replica below? PS24, Line 353: replica_preference: This value is used as a minimum memory distance for all : /// replicas. For example, by setting this to DISK_LOCAL, all : /// cached replicas will be treated as if they were not cached, : /// but local disk replicas. This can help prevent hot-spots by : /// spreading the assignments over more replicas. Allowed values : /// are CACHE_LOCAL, DISK_LOCAL and REMOTE. : /// : /// disable_cached_reads: Setting this value to true is equivalent to setting : /// replica_preference to DISK_LOCAL and takes precedence over : /// replica_preference. : /// : /// schedule_random_replica: When equivalent backends are found for a scan range (same : /// memory distance, same amount of assigned work), then : /// the first one will be picked deterministically. This aims : /// to make better use of OS buffer caches, but can lead to : /// performance bottlenecks on single hosts. Setting this : /// option to true will randomly change the order in which : /// equivalent replicas are picked for different plan nodes. : /// This helps to compute a more even assignment, on the : /// downside of increased memory usage for OS buffer caches. > It be useful to mention the defaults of these query options here. "with the downside being" Line 380: /// Add a single assignment to 'assignment'. provide complete comment (at least point out how output vars are updated and side effects, if any) Line 434: /// backend_ctx: Provides information on backends and the hosts they run on. at this point i already forgot whether this was config information about backends or assignment info. Line 445: const IpAddr* SelectRemoteBackendHost(const AssignmentCtx& assignment_ctx, BackendCtx* backend_ctx); long line -- To view, visit http://gerrit.cloudera.org:8080/2200 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I044f83806fcde820fcb38047cf6b8e780d803858 Gerrit-PatchSet: 24 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars Volker <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-Reviewer: anujphadke <[email protected]> Gerrit-HasComments: Yes
