Henry Robinson has posted comments on this change.

Change subject: PREVIEW: IMPALA-2550 Introduce query-wide execution context.
......................................................................


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/3686/2/be/src/runtime/query-exec-state.h
File be/src/runtime/query-exec-state.h:

Line 22: class QueryExecState {
> I think we need some new terminology to distinguish this from the client-fa
+1 to all of these suggestions (but wait to do any giant rename until this 
patch firms up!).


Line 42: 
Other things I'd expect to see in this class:

* Query mem tracker
* Query ID
* TQueryCtx?
* All singleton getters from RuntimeState - but consider whether this will make 
code that only has a RuntimeState harder to deal with. 
* List of fragment instance IDs (so I can easily get at fragments grouped by 
query ID)
* Coordinator reporting RPC should move to here, with a queue of statuses 
enqueued by fragments (or maybe polled by a single thread?). Will need this for 
IMPALA-3825. Don't need to do it in this patch, but thinking out loud.
* Coordinator address etc.
* For the future: maybe make RuntimeFilterBank a per-query object.


http://gerrit.cloudera.org:8080/#/c/3686/2/be/src/service/fragment-mgr.h
File be/src/service/fragment-mgr.h:

PS2, Line 50: Status ExecPlanFragment(const TExecPlanFragmentParams& params);
when you have batching you should have a method like ExecFragmentsForQuery() or 
somesuch which should explicitly have responsibility for constructing the 
QueryExecState. You could move in that direction now, even if the method would 
be called several times.


PS2, Line 80: std::shared_ptr<
> The pitfall with shared_ptr is that the destructor for QueryExecState can r
Don't disagree with the destructor concerns, but there's a use case that we 
would need to support. Often the debug web pages will want to loop over all 
query exec states to build a summary view. Since that happens concurrently with 
query execution, we need some way to deal with the fact that the web page 
handler might finish with the exec state after the fragment has finished.

One possibility is just taking locks in each exec state, which would block the 
fragment clean-up until after the other reader had done with it. That's fine, 
but can lead to a lot of lock bus traffic (but mitigated by spinlocks which 
would make it mostly equivalent to shared_ptr).

But that pattern leads us to the ugly situation we have in 
ImpalaServer::GetQueryExecState(); where we pass a boolean to say if we want to 
lock the exec state before it is returned so that no-one can go remove it from 
the map and clean it up between it being found and it being returned to the 
caller. But then the caller has to adopt the lock. If we use shared_ptr, we 
guarantee that the lifetime of the exec state lasts as long as the caller wants 
it to, even if some other thread goes ahead and removes it from the map. I 
posted a patch to remove this, as I find that pattern so cumbersome and 
error-prone.

My preference is to avoid doing work in the destructor, so that it shouldn't 
matter where it runs. It can be evicted from the map, and 'shut down' (join on 
threads etc) in one place only, but actual object destruction should be ok from 
any client of the QueryExecState class.


PS2, Line 97: boost:
std::unordered_map


-- 
To view, visit http://gerrit.cloudera.org:8080/3686
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9c513d08f718699ba0c4cdb90c117aaecf95d7fc
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Lars Volker <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Henry Robinson <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to