Lars and I spoke about this yesterday and how IMPALA-3825 <https://issues.cloudera.org/browse/IMPALA-3825> would tie into this.
How filter aggregation works now is that all backends that produce filters send the filters to the coordinator, the coordinator aggregates filters with the same ID, and broadcasts these new aggregated filters to all the backends. The goal of IMPALA-3825 is to *not* have all the filters aggregated at the coordinator (to alleviate memory pressure at the coordinator), and pass different filters to different backends and have these different backends aggregate all the filters with the same filter IDs, and them send them out to the intended targets. To do this, the coordinator needs to firstly send out a 'filter routing table' to these different backends that would be in charge of aggregation. So where would the backends store this routing table? The challenge here is that the aggregation of the filters does not fall under any specific fragment, so the lifetime of the objects used by the aggregation code (like the routing table) needs to last as long as all the fragments in that backend do. Since we have no class that allows us to do this at a query level, Lars' patch above is a pre-requisite to this patch, as it addresses this very same need. How this might build on the above patch is, if a routing table structure arrives at a backend before the first fragment, the RPC receiving that routing table would be responsible for setting up the QueryExecState class. Fragments that arrive after this may use the already set up QueryExecState. Filter related stuff could live within a subclass within the QueryExecState. On Wed, Jul 20, 2016 at 10:59 AM, Marcel Kornacker <[email protected]> wrote: > Regarding FragmentExecState: > > We could possibly do away with that and migrate whatever pieces of it > are still needed into RuntimeState (and possibly rename RuntimeState > to make it conform to the naming hierarchy). FragmentExecState is a > bit of a misnomer at the moment anyway, because it's really > FragmentInstanceExecState. If we really need a per-fragment struct, we > can always introduce that. > > However, that's a larger change, so maybe stage that after the > introduction of QueryExecMgr/QueryExecState. > > On Wed, Jul 20, 2016 at 10:24 AM, Lars Volker <[email protected]> wrote: > > Sure. Currently we duplicate state in fragment instances of a single > query, > > for example each plan fragment executor creates its own descriptor table, > > which could actually be shared among all fragment instances. For RPC > > batching (IMPALA-2550) we will transmit the descriptor table only once, > and > > it seems wasteful to re-build it for every fragment instance. This > probably > > holds true for other information as well, that is actually the same for > > multiple fragment instances of a query, such as runtime filters. > > > > On Wed, Jul 20, 2016 at 7:15 PM, Henry Robinson <[email protected]> > wrote: > > > >> Could you say something about the use cases you're trying to address? > >> That'll make it easier to review the design. > >> > >> On 20 July 2016 at 10:08, Lars Volker <[email protected]> wrote: > >> > >> > Hi all, > >> > > >> > I started drafting a header file for a query-wide QueryExecState class > >> > here: https://gerrit.cloudera.org/#/c/3686/2 > >> > > >> > It's not much and I couldn't figure out how much of the overall > changes > >> > needed for RPC batching should be included in this change. Most > notably > >> we > >> > might want to replace FragmentExecState in FragmentMgr (now > QueryExecMgr) > >> > with QueryExecState altogether and drive execution from that class. On > >> the > >> > other hand keeping the QueryExecState as simple as possible for now > would > >> > make the change less intrusive. > >> > > >> > I'd be glad for feedback on this first draft. > >> > > >> > Thanks, Lars > >> > > >> > >> > >> > >> -- > >> Henry Robinson > >> Software Engineer > >> Cloudera > >> 415-994-6679 > >> >
