Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-2979: Fix scheduling on remote hosts
......................................................................


Patch Set 24:

(49 comments)

http://gerrit.cloudera.org:8080/#/c/2200/2/be/src/scheduling/simple-scheduler.cc
File be/src/scheduling/simple-scheduler.cc:

Line 149:     Status status = HostnameToIpAddr(backends[i].hostname, &ip);
> The underlying call to getaddrinfo() might return multiple IP addresses to 
what would be in that jira?


http://gerrit.cloudera.org:8080/#/c/2200/19/be/src/scheduling/simple-scheduler.cc
File be/src/scheduling/simple-scheduler.cc:

Line 431: 
> I changed it.
it started out with &&/|| on the following line, and then some people gradually 
drifted away from that. i often find leaving && at the end of the preceding 
line harder to read, esp. if you mix && and || and have parentheses, because it 
obscures the logical structure.

instead of trying to shoehorn everything into ?:, you can also use an if. 
leaving the variable unassigned is fine from my perspective, because the 
declaration immediately precedes the assignment.


http://gerrit.cloudera.org:8080/#/c/2200/24/be/src/scheduling/simple-scheduler.cc
File be/src/scheduling/simple-scheduler.cc:

Line 360:   return backend_maps;
does this create another copy?


Line 439:   AssignmentCtx assignment_ctx;
this contains the assignment heap. so this isn't load-balanced across the 
query? (should it be?)


Line 440:   BackendCtx backend_ctx(AtomicReadBackendMapsPtr());
can we get the backend config once per query?


Line 451:     TNetworkAddress backend_hostport;
this looks like a write-only variable (except you also read it in l518, but 
there you could simply pass in backend.address)


Line 453:       backend_hostport = MakeNetworkAddress(FLAGS_hostname, 
FLAGS_be_port);
where is this recorded?


Line 508:       //   optimal OS buffer cache utilization.
i'd say "there's no OS buffer cache to worry about"


Line 519:           scan_range_locations, backend_ctx, &assignment_ctx, 
&byte_counters, assignment);
since assignment_ctx records the per-backend assignment state, do the counters 
really need to go into a separate struct?


Line 525:     }  // End of backend host selection.
you're not using backend_hostport here


Line 534:     RecordScanRangeAssignment( backend_hostport, node_id, host_list,
formatting


Line 542:   if (VLOG_FILE_IS_ON) {
move this into a member function of the relevant struct (this function is 
already long enough as it is)


Line 565: void SimpleScheduler::RecordScanRangeAssignment( const 
TNetworkAddress& backend_hostport,
formatting


Line 568:     AssignmentCtx* assignment_ctx, ByteCounters* byte_counters,
backend_ctx, assignment_ctx, and byte_counters always show up together. is 
there a concept here (that a single class should encapsulate)? should this 
entire function be a member function of that class?


Line 596:   for (const TScanRangeLocation& location: 
scan_range_locations.locations) {
why are we iterating over these again? it feels like we already selected a 
particular replica


Line 1058:         // TODO: Do we have rules how to format and indent lambdas?
maybe like a function?


Line 1060:         return backend_ctx.GetBackendRank(data_locations[a])
which would require extra indentation here and the closing brace on a separate 
line


http://gerrit.cloudera.org:8080/#/c/2200/2/be/src/scheduling/simple-scheduler.h
File be/src/scheduling/simple-scheduler.h:

Line 263:   /// reservation. The map is used to cancel queries whose 
reservation has been preempted.
> They can be either. I renamed them to describe what they mean. I also simpl
whether the implementation needs to do something with the strings shouldn't be 
reflected in the signature. the implementation should make copies if that's the 
case; but if you pass it by value and later on the implementation will not need 
to modify the params anymore, you'll most likely still pass them by value 
(because the reason for doing so will have been forgotten).


http://gerrit.cloudera.org:8080/#/c/2200/24/be/src/scheduling/simple-scheduler.h
File be/src/scheduling/simple-scheduler.h:

Line 93:   /// Map from a backend's IP address to a list of backends running on 
that node.
a host's ip address?


Line 96:   /// Map from a backend's IP address to the next position in its 
corresponding list of
a host's ip address?

also, i would phrase that as "from ... to the next backend to be round-robin 
scheduled for that host (needed for setups with multiple backends on a single 
host)".


Line 100:   /// Map from a backend's hostname to its IP address to support 
hostname based backend
why "backend's hostname" instead of just "hostname"?


Line 107:   struct BackendMaps {
this is the semi-static configuration that is external to the scheduling 
process itself, so maybe something like BackendConfig or related? there are a 
lot of maps in this class (and lots of indirection), and it's helpful to remind 
the reader what can be updated in the course of a scheduling operation


Line 114:   /// TODO: Better name for BackendAssignmentInfo? 
AssignmentInfoElement?
what's wrong with BackendAssignmentInfo? looks like that's exactly what it is. 
(just AssignmentInfo/-Element is less specific; not sure the -Element adds 
anything)


Line 117:   struct BackendAssignmentInfo {
add comments for fields (who sets it, invariants)


Line 123:       return std::tie(assigned_bytes, random_rank) >
i had to look this up. i think it's easier to read when converted to comparison 
logic that doesn't use ::tie() (and you only need a single line more)


Line 128:   /// Heap to compute candidates for scan range assignments. By 
default boost implements
you never spell out how exactly that heap is organized (i know what it is, but 
it's still good to spell it out)


Line 138:   /// Struct to store backend information in an addressable heap.
add information important to the understanding of this data structure: where is 
it used, who creates it, is this shared between threads?

same for the next one.

re naming: a log stores information chronologically, does this class do that?

"this stores backend information" is needlessly vague. what kind of 
information? 

when you name something 'ctx' i assume it provides context for some specific 
operation or data structure. is this simply the per-backend assignment state?


Line 148:     AssignmentHeap backend_heap_;
confusing to name the type and the variable differently


Line 153:   /// TODO: Better name for BackendCtx?
what does this provide context for?


Line 157:   class BackendCtx {
what is the logical division between this struct and AssignmentCtx? is there 
one?


Line 161:     bool HasUnusedBackends() const;
comment


Line 181:     const BackendMap& backend_map() const { return 
backend_maps_->backend_map; }
too many similar sounding names also make the code harder to follow


Line 194:     /// Store a random permutation of backend hosts to select 
backends from.
provide class comment that gives context: why random permutations? etc.

right now i can kind of guess what these are for, but that's only because of 
the conversations we've had.


Line 201:   /// A struct to track various counts of assigned bytes during 
scheduling.
AssignedBytesCounters then? provides more information than ByteCounters


Line 202:   /// TODO: Move this into AssignmentCtx?
good question, but definitely worth answering


Line 211:   /// and atomically swap the contents of this pointer.
provide an *outline* of the main data structures and the scheduling process in 
the class comment (just enough to give the reader context which is useful for 
comprehending the details of the class declaration). right now you kind of have 
to piece it together from the various member comments.


Line 216:   mutable boost::mutex backend_maps_lock_;
leave todo, maybe in class comment, to switch to kudu's rw locks, this is a 
very heavily frequented mutex


Line 288:   /// protecting the access by backend_maps_lock_.
"access with"


Line 292:   BackendMapsPtr AtomicReadBackendMapsPtr() const;
or simply Get-/SetBackendMapsPtr()? (or Get-/SetBackendConfig?)


Line 333:   /// of scan ranges.
i don't understand the meaning of the last sentence.


Line 339:   /// processed first. If multiple replicas also run a backend, then 
the 'memory distance'
what does "if multiple replicas also run a backend" mean?


Line 340:   /// of each backend is considered. The concept of memory distance 
reflects the cost of
good to point out: memory distance should roughly reflect the effort involved 
of moving the data into the processing backend's main memory


Line 349:   /// the amount of work each backend has been assigned so far.
you could also paraphrase the last sentence as "they will be load-balanced by 
assigned bytes across all backends"


Line 353:   /// replica_preference: This value is used as a minimum memory 
distance for all
feel free to insert break after ':' and then use the full line width, to reduce 
blank space


Line 364:   /// schedule_random_replica: When equivalent backends are found for 
a scan range (same
is this called node_random_replica below?


PS24, Line 353: replica_preference: This value is used as a minimum memory 
distance for all
              :   ///                     replicas. For example, by setting 
this to DISK_LOCAL, all
              :   ///                     cached replicas will be treated as if 
they were not cached,
              :   ///                     but local disk replicas. This can 
help prevent hot-spots by
              :   ///                     spreading the assignments over more 
replicas. Allowed values
              :   ///                     are CACHE_LOCAL, DISK_LOCAL and 
REMOTE.
              :   ///
              :   /// disable_cached_reads: Setting this value to true is 
equivalent to setting
              :   ///                       replica_preference to DISK_LOCAL 
and takes precedence over
              :   ///                       replica_preference.
              :   ///
              :   /// schedule_random_replica: When equivalent backends are 
found for a scan range (same
              :   ///                          memory distance, same amount of 
assigned work), then
              :   ///                          the first one will be picked 
deterministically. This aims
              :   ///                          to make better use of OS buffer 
caches, but can lead to
              :   ///                          performance bottlenecks on 
single hosts. Setting this
              :   ///                          option to true will randomly 
change the order in which
              :   ///                          equivalent replicas are picked 
for different plan nodes.
              :   ///                          This helps to compute a more 
even assignment, on the
              :   ///                          downside of increased memory 
usage for OS buffer caches.
> It be useful to mention the defaults of these query options here.
"with the downside being"


Line 380:   /// Add a single assignment to 'assignment'.
provide complete comment (at least point out how output vars are updated and 
side effects, if any)


Line 434:   /// backend_ctx:        Provides information on backends and the 
hosts they run on.
at this point i already forgot whether this was config information about 
backends or assignment info.


Line 445:   const IpAddr* SelectRemoteBackendHost(const AssignmentCtx& 
assignment_ctx, BackendCtx* backend_ctx);
long line


-- 
To view, visit http://gerrit.cloudera.org:8080/2200
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I044f83806fcde820fcb38047cf6b8e780d803858
Gerrit-PatchSet: 24
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Lars Volker <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-Reviewer: anujphadke <[email protected]>
Gerrit-HasComments: Yes

Reply via email to