I'm certainly aware of tools that read TEventSequence and expect it to have certain structures. If we change the profiles significantly, we should insert a version sigil in there so that a client could at least figure out that they're looking at something new.
A bigger modeling conversation is the use of string keys everywhere. For the timeline, for example, it's a list of (description, time) pairs, with the descriptions being strings. An alternative would have been to put the strings in Thrift itself. The latter, I think, makes tooling easier to write (since you're calling "get_query_started_time()" instead of "for description, time in timelines: if description == 'query started': return time". I recognize working with Thrift is kind of a pain, but I want to throw the thought out there if we're adding things to profiles. -- Philip On Tue, Sep 4, 2018 at 4:02 PM Todd Lipcon <[email protected]> wrote: > Hey folks, > > I'm working on a patch to add some more diagnostics from the planning > process into query profiles. > > Currently, only the planning "Timeline" is reported back as part of the > response to createExecRequest. As part of the fetch-on-demand catalog work > I'd like to start exposing various counters such as cache hit/miss counts, > time spent on remote calls to the catalog, etc. Even in the existing code > paths, I can see some similar metrics being useful. > > My current thinking is to remove the 'timeline' (TEventSequence) field > from TExecRequest and replace it with a full TRuntimeProfileNode. I'd then > add some capability in the Java side to fill in counters, etc, in this > structure. > > Any concerns with this approach before I go down this path? Are there any > compatibility guarantees I need to uphold with the profile output of > queries? > > -Todd > -- > Todd Lipcon > Software Engineer, Cloudera >
