One more parting thought: why don't we just call 'GetCatalogDelta()'
directly from the catalog callback in order to do a direct handoff, instead
of storing them in this 'pending' struct? Given the statestore uses a
dedicated thread per subscriber (right?) it seems like it would be fine for
the update callback to take a long time, no?

-Todd

On Tue, Aug 21, 2018 at 11:09 AM, Todd Lipcon <t...@cloudera.com> wrote:

> Thanks, Tim. I'm guessing once we switch over these RPCs to KRPC instead
> of Thrift we'll alleviate some of the scalability issues and maybe we can
> look into increasing frequency or doing a "push" to the statestore, etc. I
> probably won't work on this in the near term to avoid complicating the
> ongoing changes with catalog.
>
> -Todd
>
> On Tue, Aug 21, 2018 at 10:22 AM, Tim Armstrong <tarmstr...@cloudera.com>
> wrote:
>
>> This is somewhat relevant for admission control too - I had thought about
>> some of these issues in that context, because reducing the latency of
>> admission controls state propagation helps avoid overadmission but having
>> a
>> very low statestore frequency is very inefficient and doesn't scale well
>> to
>> larger clusters.
>>
>> For the catalog updates I agree we could do something with long polls
>> since
>> it's a single producer so that the "idle" state of the system has a thread
>> sitting in the update callback on catalogd waiting for an update.
>>
>> I'd also thought at one point about allowing subscribers to notify the
>> statestore that they had something to add to the topic. That could be
>> treated as a hint to the statestore to schedule the subscriber update
>> sooner. This would also work for admission control since coordinators
>> could
>> notify the statestore when the first query was admitted after the previous
>> statestore update.
>>
>> On Tue, Aug 21, 2018 at 9:41 AM, Todd Lipcon <t...@cloudera.com> wrote:
>>
>> > Hey folks,
>> >
>> > In my recent forays into the catalog->statestore->impalad metadata
>> > propagation code base, I noticed that the latency of any update is
>> > typically between 2-4 seconds with the standard 2-second statestore
>> polling
>> > interval. That's because the code currently works as follows:
>> >
>> > 1. in the steady state with no recent metadata changes, the catalogd's
>> > state is:
>> > -- topic_updates_ready_ = true
>> > -- pending_topic_updates_ = empty
>> >
>> > 2. some metadata change happens, which modifies the version numbers in
>> the
>> > Java catalog but doesn't modify any of the C++ side state
>> >
>> > 3. the next statestore poll happens due to the normal interval
>> expiring. On
>> > average, this will take *1/2 the polling interval*
>> > -- this sees that pending_topic_updates_ is empty, so returns no
>> results.
>> > -- it sets topic_updates_ready_ = false and triggers the "gather" thread
>> >
>> > 4. the "gather" thread wakes up and gathers updates, filling in
>> > 'pending_topic_updates_' and setting 'topic_updates_ready_' back to true
>> > (typically subsecond in smallish catalogs, so this happens before the
>> next
>> > poll)
>> >
>> > 5. wait *another full statestore polling interval* (2 seconds) after
>> step
>> > #3 above, at which point we deliver the metadata update to the
>> statestore
>> >
>> > 6. wait on average* 1/2 the polling interval* until any particular
>> impalad
>> > gets the update from #4
>> >
>> > So. in the absolute best case, we wait one full polling interval (2
>> > seconds), and in the worst case we wait two polling intervals (4
>> seconds).
>> >
>> > Has anyone looked into optimizing this at all? It seems like we could
>> have
>> > metadata changes trigger an immediate "collection" into the C++ side,
>> and
>> > have the statestore update callback wait ("long poll" style) for an
>> update
>> > rather than skip if there is nothing available.
>> >
>> > -Todd
>> > --
>> > Todd Lipcon
>> > Software Engineer, Cloudera
>> >
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to