GetCatalogDelta used to block catalogd from executing DDLs and the pending struct was yet another cache to smooth things a little.
On Tue, Aug 21, 2018 at 11:28 AM Todd Lipcon <t...@cloudera.com> wrote: > One more parting thought: why don't we just call 'GetCatalogDelta()' > directly from the catalog callback in order to do a direct handoff, instead > of storing them in this 'pending' struct? Given the statestore uses a > dedicated thread per subscriber (right?) it seems like it would be fine for > the update callback to take a long time, no? > > -Todd > > On Tue, Aug 21, 2018 at 11:09 AM, Todd Lipcon <t...@cloudera.com> wrote: > > > Thanks, Tim. I'm guessing once we switch over these RPCs to KRPC instead > > of Thrift we'll alleviate some of the scalability issues and maybe we can > > look into increasing frequency or doing a "push" to the statestore, etc. > I > > probably won't work on this in the near term to avoid complicating the > > ongoing changes with catalog. > > > > -Todd > > > > On Tue, Aug 21, 2018 at 10:22 AM, Tim Armstrong <tarmstr...@cloudera.com > > > > wrote: > > > >> This is somewhat relevant for admission control too - I had thought > about > >> some of these issues in that context, because reducing the latency of > >> admission controls state propagation helps avoid overadmission but > having > >> a > >> very low statestore frequency is very inefficient and doesn't scale well > >> to > >> larger clusters. > >> > >> For the catalog updates I agree we could do something with long polls > >> since > >> it's a single producer so that the "idle" state of the system has a > thread > >> sitting in the update callback on catalogd waiting for an update. > >> > >> I'd also thought at one point about allowing subscribers to notify the > >> statestore that they had something to add to the topic. That could be > >> treated as a hint to the statestore to schedule the subscriber update > >> sooner. This would also work for admission control since coordinators > >> could > >> notify the statestore when the first query was admitted after the > previous > >> statestore update. > >> > >> On Tue, Aug 21, 2018 at 9:41 AM, Todd Lipcon <t...@cloudera.com> wrote: > >> > >> > Hey folks, > >> > > >> > In my recent forays into the catalog->statestore->impalad metadata > >> > propagation code base, I noticed that the latency of any update is > >> > typically between 2-4 seconds with the standard 2-second statestore > >> polling > >> > interval. That's because the code currently works as follows: > >> > > >> > 1. in the steady state with no recent metadata changes, the catalogd's > >> > state is: > >> > -- topic_updates_ready_ = true > >> > -- pending_topic_updates_ = empty > >> > > >> > 2. some metadata change happens, which modifies the version numbers in > >> the > >> > Java catalog but doesn't modify any of the C++ side state > >> > > >> > 3. the next statestore poll happens due to the normal interval > >> expiring. On > >> > average, this will take *1/2 the polling interval* > >> > -- this sees that pending_topic_updates_ is empty, so returns no > >> results. > >> > -- it sets topic_updates_ready_ = false and triggers the "gather" > thread > >> > > >> > 4. the "gather" thread wakes up and gathers updates, filling in > >> > 'pending_topic_updates_' and setting 'topic_updates_ready_' back to > true > >> > (typically subsecond in smallish catalogs, so this happens before the > >> next > >> > poll) > >> > > >> > 5. wait *another full statestore polling interval* (2 seconds) after > >> step > >> > #3 above, at which point we deliver the metadata update to the > >> statestore > >> > > >> > 6. wait on average* 1/2 the polling interval* until any particular > >> impalad > >> > gets the update from #4 > >> > > >> > So. in the absolute best case, we wait one full polling interval (2 > >> > seconds), and in the worst case we wait two polling intervals (4 > >> seconds). > >> > > >> > Has anyone looked into optimizing this at all? It seems like we could > >> have > >> > metadata changes trigger an immediate "collection" into the C++ side, > >> and > >> > have the statestore update callback wait ("long poll" style) for an > >> update > >> > rather than skip if there is nothing available. > >> > > >> > -Todd > >> > -- > >> > Todd Lipcon > >> > Software Engineer, Cloudera > >> > > >> > > > > > > > > -- > > Todd Lipcon > > Software Engineer, Cloudera > > > > > > -- > Todd Lipcon > Software Engineer, Cloudera > -- Tianyi Wang