Lars Volker has posted comments on this change.

Change subject: IMPALA-2979: Fix scheduling on remote hosts
......................................................................


Patch Set 22:

(40 comments)

Thanks for the review. Please see PS23.

http://gerrit.cloudera.org:8080/#/c/2200/22/be/src/scheduling/simple-scheduler.cc
File be/src/scheduling/simple-scheduler.cc:

Line 236:   for(const BackendList::value_type& backend: backends) {
> for (
Done


Line 271:       for(const TTopicItem& item: delta.topic_entries) {
> formatting (and elsewhere)
Done


Line 312:       // TODO: Currently each non-empty statestore update triggers a 
rebuild of these
> collect bigger todos at the bottom of the class comment in the .h
Done


Line 320:     // TODO: Inject global dependencies to make this code testable.
> same here
Done


Line 465:     lock_guard<mutex> l(backend_map_lock_);
> if you're trying to assign 100k scan ranges, this is still going to cause c
Done


Line 553:       AssignmentHeap::handle_type handle = 
node_assignment.backend_heap.push(new_info);
> you always update backend_heap and backend_handles together (and if you don
Done


Line 1058:   // Reset the random backend ranks and global assignment state.
> use actual var names in comments
Removed the method.


Line 1067:     AssignmentHeap::handle_type handle = 
global_assignment_.backend_heap.push(new_info);
> might as well use an inlined c'tor here
Done


Line 1078:     AssignmentInfo* node_assignment) {
> either change to context or to query_node_assignment, to differentiate more
Done


Line 1079:   // This method is called with a list of local backend candidates, 
so they also must be
> also dcheck(!data_locations.empty())
Done


Line 1080:   // contained in backend_handles and thus in the global_assignment_ 
heap.
> what is backend_handles? what is the list of local backend candidates?
Done


Line 1081:   vector<size_t> candidates;
> what are these?
Added a comment.


Line 1084:   for (size_t i = 0; i < data_locations.size(); ++i) {
> why not just int?
Some compilers warn here because of the different types (size() returns 
size_t). Ours doesn't changed it to int.


Line 1089:       const BackendAssignmentInfo& heap_data = *handle_it->second;
> rename variable to reflect its meaning
I combined both lines into one.


Line 1102:     // candidates in-place.
> more specifically ", and remove all others from 'candidates'" (instead of '
Done


Line 1103:     int64_t min_last_assigend = numeric_limits<int64_t>::max();
> spelling
Done


Line 1108:       const BackendAssignmentInfo& heap_data =
> meaningful name
Done


Line 1109:           *(global_assignment_.backend_handles[backend_ip]);
> who locks this?
Done


Line 1139:     AssignmentInfo* node_assignment) {
> change to query_assignment to differentiate from global_assignment_
Now changed to assignment_ctx. The whole method is much simpler now.


Line 1148:     // implies that the node_assignment heap must contain all of the 
handles, because
> all of which handles?
Done


Line 1149:     // backends in the global heap are ordered by the time of their 
last assignment. If
> well, primarily by the number of assigned bytes
Done


Line 1153:     // list in between scheduling runs for multiple scan ranges of 
the same plan node.
> i'm not following.
Done


Line 1158:     // scheduling of multiple scan ranges of the same plan node.
> does this get handled for global_assignment_?
Done


http://gerrit.cloudera.org:8080/#/c/2200/22/be/src/scheduling/simple-scheduler.h
File be/src/scheduling/simple-scheduler.h:

Line 106:     int64_t last_assignment_timestamp;
> comment that this is a logical timestamp, derived from num_assignments_
Done


Line 123:   typedef boost::heap::binomial_heap<BackendAssignmentInfo,
> does this really require anything from boost (and not std)? we're trying to
Yes, we need the heap to be addressable to perform a decrease key operation on 
arbitrary element in the heap. std::priority_queue doesn't have a way to do 
this, and using a vector + std::make_heap is  less efficient O(n) vs O(logn) 
over the number of backends. Of course it might be viable for the numbers we 
expect, let me know if you want me to give it a try.


Line 141:   /// Total number of assignments made during the lifetime of the 
scheduler.
> clarify what you mean by assignment (per query, per plan node, per scan ran
Done


Line 210:   /// data from the given data_loation.
> what is data_location?
Yes, I updated the comment.


Line 216:   /// node_assignments:     Used to track per-query and per-host 
assignment information.
> which backends does this track? is this an output param? the global state?
Updated the comment.


Line 254:   Status ComputeScanRangeAssignmentForQuery(const TQueryExecRequest& 
exec_request,
> isn't the 'for query' implied?
Done


Line 262:   /// First we assemble a list of candidate backends, from which the 
scan range can be
> no ,
Done


Line 263:   /// read. Backends are classified by their memory distance to the 
data, and the closest
> briefly explain memory distance, it's not a standard term
Done


Line 266:   /// if they were further away. Only backends running on one of the 
replicas of the scan
> i don't think you should describe the steps of the algorithm here (you coul
Done


Line 273:   /// list, resulting in a local read. First a search is performed to 
find the candidate
> yep, this is way too detailed.
Done


Line 307:   Status ComputeScanRangeAssignmentForPlanNode(PlanNodeId node_id,
> same here, the parameters differentiate it from 'for query'.
Done


Line 350:   /// Lookup the IP address of 'hostname' and return whether the 
lookup was successful. If
> lookup is a noun, to look up is the verb.
Done


Line 356:   /// Resolve a host to one of its IP addresses by calling 
getaddrinfo() and sorting the
> comment on semantics of result, not how it's implemented
Done


Line 358:   Status HostnameToIpAddr(const Hostname& hostname, IpAddr* ip);
> on the face of it this looks like LookUpBackendIp. how is it different?
One looks at known backends only, the other on tries to resolve via the OS. I 
added to both comments.


Line 371:   /// minimum number of assigned bytes. If backends have been 
assigned equal amounts of
> i guess this is "with the minimum number of assigned bytes (according to 'c
Done


Line 376:       bool global_backend_order, AssignmentInfo* context);
> is this the global assignment info?
Added to the comment.


Line 378:   /// Among backend hosts collocated with 'data_locations', select 
the one with the
> collocated?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/2200
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I044f83806fcde820fcb38047cf6b8e780d803858
Gerrit-PatchSet: 22
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Lars Volker <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: anujphadke <[email protected]>
Gerrit-HasComments: Yes

Reply via email to