Henry Robinson has posted comments on this change. Change subject: PREVIEW: IMPALA-2550 Introduce query-wide execution context. ......................................................................
Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/3686/2/be/src/runtime/query-exec-state.h File be/src/runtime/query-exec-state.h: Line 22: class QueryExecState { > I think we need some new terminology to distinguish this from the client-fa +1 to all of these suggestions (but wait to do any giant rename until this patch firms up!). Line 42: Other things I'd expect to see in this class: * Query mem tracker * Query ID * TQueryCtx? * All singleton getters from RuntimeState - but consider whether this will make code that only has a RuntimeState harder to deal with. * List of fragment instance IDs (so I can easily get at fragments grouped by query ID) * Coordinator reporting RPC should move to here, with a queue of statuses enqueued by fragments (or maybe polled by a single thread?). Will need this for IMPALA-3825. Don't need to do it in this patch, but thinking out loud. * Coordinator address etc. * For the future: maybe make RuntimeFilterBank a per-query object. http://gerrit.cloudera.org:8080/#/c/3686/2/be/src/service/fragment-mgr.h File be/src/service/fragment-mgr.h: PS2, Line 50: Status ExecPlanFragment(const TExecPlanFragmentParams& params); when you have batching you should have a method like ExecFragmentsForQuery() or somesuch which should explicitly have responsibility for constructing the QueryExecState. You could move in that direction now, even if the method would be called several times. PS2, Line 80: std::shared_ptr< > The pitfall with shared_ptr is that the destructor for QueryExecState can r Don't disagree with the destructor concerns, but there's a use case that we would need to support. Often the debug web pages will want to loop over all query exec states to build a summary view. Since that happens concurrently with query execution, we need some way to deal with the fact that the web page handler might finish with the exec state after the fragment has finished. One possibility is just taking locks in each exec state, which would block the fragment clean-up until after the other reader had done with it. That's fine, but can lead to a lot of lock bus traffic (but mitigated by spinlocks which would make it mostly equivalent to shared_ptr). But that pattern leads us to the ugly situation we have in ImpalaServer::GetQueryExecState(); where we pass a boolean to say if we want to lock the exec state before it is returned so that no-one can go remove it from the map and clean it up between it being found and it being returned to the caller. But then the caller has to adopt the lock. If we use shared_ptr, we guarantee that the lifetime of the exec state lasts as long as the caller wants it to, even if some other thread goes ahead and removes it from the map. I posted a patch to remove this, as I find that pattern so cumbersome and error-prone. My preference is to avoid doing work in the destructor, so that it shouldn't matter where it runs. It can be evicted from the map, and 'shut down' (join on threads etc) in one place only, but actual object destruction should be ok from any client of the QueryExecState class. PS2, Line 97: boost: std::unordered_map -- To view, visit http://gerrit.cloudera.org:8080/3686 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9c513d08f718699ba0c4cdb90c117aaecf95d7fc Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars Volker <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
