Thanks, Tim. I'm guessing once we switch over these RPCs to KRPC instead of
Thrift we'll alleviate some of the scalability issues and maybe we can look
into increasing frequency or doing a "push" to the statestore, etc. I
probably won't work on this in the near term to avoid complicating the
ongoing changes with catalog.

-Todd

On Tue, Aug 21, 2018 at 10:22 AM, Tim Armstrong <tarmstr...@cloudera.com>
wrote:

> This is somewhat relevant for admission control too - I had thought about
> some of these issues in that context, because reducing the latency of
> admission controls state propagation helps avoid overadmission but having a
> very low statestore frequency is very inefficient and doesn't scale well to
> larger clusters.
>
> For the catalog updates I agree we could do something with long polls since
> it's a single producer so that the "idle" state of the system has a thread
> sitting in the update callback on catalogd waiting for an update.
>
> I'd also thought at one point about allowing subscribers to notify the
> statestore that they had something to add to the topic. That could be
> treated as a hint to the statestore to schedule the subscriber update
> sooner. This would also work for admission control since coordinators could
> notify the statestore when the first query was admitted after the previous
> statestore update.
>
> On Tue, Aug 21, 2018 at 9:41 AM, Todd Lipcon <t...@cloudera.com> wrote:
>
> > Hey folks,
> >
> > In my recent forays into the catalog->statestore->impalad metadata
> > propagation code base, I noticed that the latency of any update is
> > typically between 2-4 seconds with the standard 2-second statestore
> polling
> > interval. That's because the code currently works as follows:
> >
> > 1. in the steady state with no recent metadata changes, the catalogd's
> > state is:
> > -- topic_updates_ready_ = true
> > -- pending_topic_updates_ = empty
> >
> > 2. some metadata change happens, which modifies the version numbers in
> the
> > Java catalog but doesn't modify any of the C++ side state
> >
> > 3. the next statestore poll happens due to the normal interval expiring.
> On
> > average, this will take *1/2 the polling interval*
> > -- this sees that pending_topic_updates_ is empty, so returns no results.
> > -- it sets topic_updates_ready_ = false and triggers the "gather" thread
> >
> > 4. the "gather" thread wakes up and gathers updates, filling in
> > 'pending_topic_updates_' and setting 'topic_updates_ready_' back to true
> > (typically subsecond in smallish catalogs, so this happens before the
> next
> > poll)
> >
> > 5. wait *another full statestore polling interval* (2 seconds) after step
> > #3 above, at which point we deliver the metadata update to the statestore
> >
> > 6. wait on average* 1/2 the polling interval* until any particular
> impalad
> > gets the update from #4
> >
> > So. in the absolute best case, we wait one full polling interval (2
> > seconds), and in the worst case we wait two polling intervals (4
> seconds).
> >
> > Has anyone looked into optimizing this at all? It seems like we could
> have
> > metadata changes trigger an immediate "collection" into the C++ side, and
> > have the statestore update callback wait ("long poll" style) for an
> update
> > rather than skip if there is nothing available.
> >
> > -Todd
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to