Lars Volker has posted comments on this change.

Change subject: IMPALA-2979: Fix scheduling on remote hosts
......................................................................


Patch Set 19:

(26 comments)

Thank you for the review. I addressed your comments, please see PS 22.

http://gerrit.cloudera.org:8080/#/c/2200/19/be/src/scheduling/simple-scheduler.cc
File be/src/scheduling/simple-scheduler.cc:

Line 140: 
> comment on what you're doing here ("construct backend_map_")
Done


Line 179: 
> remove blank line
Done


Line 378:   BackendMap::iterator entry = backend_map_.end();
> why initialize?
https://google.github.io/styleguide/cppguide.html#Local_Variables

I converted it to a ternary expression.


Line 390:     for (const auto& location : data_locations) {
> single line
Done


Line 431:         node.__isset.hdfs_scan_node &&
> make indentation reflect logical structure:
I changed it.

I grepped through the be code and leaving the && and || at the previous line 
seems to be the predominant convention. Is there some easy rule to follow?


Line 454:   lock_guard<mutex> lock(backend_map_lock_);
> please time this function call, this global lock is a bit worrying.
I moved the lock into the outermost loop, so it is acquired and released per 
scan range now.


Line 495:       vector<string> replica_candidates;
> ip or hostname? since there is so much confusion around it and the potentia
IpAddrs. I added a typedef to make it more clear.


Line 513:         LookUpBackendIp(replica_host.hostname, &replica_ip);
> looks like a contradiction: Lookup*Backend*Ip returns a replica_ip
Done


Line 533:           global_backend_order || cached_replica, &context, 
&backend));
> shouldn't global_backend_order always be on for remote reads/only ever not 
Yes, remote backends are always selected by global rank. I made it more clear 
in the code.


Line 537:     uint64_t scan_range_length = 0;
> please no unsigned ints, unless you're dealing with bitmaps.
Done


Line 552:     {
> formatting
Done


Line 554:             scan_range_length,
> please use a d'tor, this is too hard to read
Done


Line 580:       (*handle).assigned_bytes += scan_range_length;
        :       // TODO: use increase or decrease here. This should merely be a 
small performance
        :       // improvement, change this once everything is functioally 
correct.
        :       global_assignment_.backend_heap.update(handle);
> This is a bug. Those lines must be removed, otherwise we track assigned byt
Done


Line 1150:   // Pick the first one, then move it to the back of the queue.
> implement early exit if size == 1
Done


http://gerrit.cloudera.org:8080/#/c/2200/19/be/src/scheduling/simple-scheduler.h
File be/src/scheduling/simple-scheduler.h:

Line 45: //
> you need to describe the overall algorithm somewhere. this is a good place.
I added an overview to ComputeScanRangeAssignments. Should I extend it to 
explain what the overall class does when Schedule() is called?


Line 96:   boost::unordered_map<std::string, int> backend_rank_;
> it's random, so maybe random_backend_rank_.
Done


Line 99:   struct HeapData {
> give this a meaningful name (HeapData is like calling a int* 'pointer_var')
Done


Line 101:     uint64_t last_assigned;
> what is the unit? (-> meaningful and specific name)
Done


Line 119:       boost::heap::compare<std::greater<HeapData> > > AssignmentHeap;
> are the spaces in > > > still needed?
They are not since C++11, but I had kept them for reasons of consistency 
throughout the file. Removed all of them now.


Line 121:   /// Map to look up handles to heap elements to modify heap element 
keys.
> what is the key? you can also typedef std::string IpAddr if that's what it 
Done


Line 134:   QueryContext global_assignment_;
> this sounds like a contradiction: QueryContext (= query specific) and globa
Done


Line 205:   /// data_locations:       List of candidate replica IP addresses, 
from which the file
> then maybe call this replica_ip for absolute clarity?
Done


Line 255:   /// TODO: have one context per query?
> cryptic comment, context isn't mentioned anywhere here
Yes, that was meant as a TODO to be fixed in this change, as in "Let's discuss 
how to do this". I'll remove it before this gets merged (or rewrite it if we 
cannot decide yet).


Line 311:   /// round-robin the backends on a single host.
> "round-robin through"
Done


Line 315:   /// has not been assigned for the longest time is chosen. After 
that the backend rank is
> what if !global_backend_order?
Then all backends are considered according to their rank. I adjusted the 
comment to reflect that.


Line 327:   void SelectBackendOnHost(BackendMap::iterator entry, 
TBackendDescriptor* backend);
> const&
Done. However it adds an additional indirection compared to 'const 
BackendMap::iterator'. Should we generally treat iterators as values or as 
pointers?


-- 
To view, visit http://gerrit.cloudera.org:8080/2200
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I044f83806fcde820fcb38047cf6b8e780d803858
Gerrit-PatchSet: 19
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Lars Volker <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: anujphadke <[email protected]>
Gerrit-HasComments: Yes

Reply via email to