Re: Query history statistics API

Юрий Wed, 09 Jan 2019 06:26:25 -0800

Hi,

I have question related to subject. How do you think should we track
EXPLAIN queries also? I see reasons as to skip it and as include to history:


pros:

We will have full picture and see all queries.

cons:

Such queries can be considered as investigate/debug/service queries and can
push out real queries.


What is your opinion?


пт, 21 дек. 2018 г. в 19:23, Юрий <jury.gerzhedow...@gmail.com>:

> Vladimir, thanks for your expert opinion.
>
> I have some thoughts about 5 point.
> I tried to find how it works for Oracle and PG:
>
> *PG*: keep by default 1000 (can be configured) statements without and
> discard the least-executed statements. Update statistics is asynchronous
> process and statistics may have lag.
>
> *Oracle*: use shared pool for historical data and can evict records with
> min time of last execution in case free space at shared pool is not enough
> for a data which can be related not only historical statistics. So seems
> also separate asynchronous process (information about it so small).
>
>
> Unfortunately I could not find information about big workload and how it
> handled for these databases. However We could see that both of vendors use
> asynchronous statistic processing.
>
>
> I see few variants how we can handle very high workload.
>
> First part of variants use asynchronous model with separate thread which
> should take elements to update stats from a queue:
> 1) We blocking on overlimited queue and wait when capacity will be enough
> to put new element.
>
> + We have all actual statistics
> - End of our query execution can be blocked.
>
> 2) Discard statistics for ended query in case queue is full.
>
> + Very fast for current query
> - We lose part of statistics.
>
> 3) Do full clean of statistic's queue.
>
> + Fast and freespace for further elements
> - We lose big number of statistic elements.
>
>
> Second part of variants use current approach for queryMetrics. When we
> have some additional capacity for CHM with history + periodical cleanup the
> Map. In case even the additional space is not enough we can :
> 1) Discard statistics for ended query
> 2) Do full clean CHM and discard all gathered information.
>
> First part of variants potentially should work faster due to we can update
> history Map in single thread without contention and put to queue should be
> faster.
>
>
> What do you think? Which of the variant will be prefer or may be you can
> suggest another way to handle potential huge workload?
>
> Also there is one initial question which stay not clear to me - it is
> right place for new API.
>
>
> пт, 21 дек. 2018 г. в 13:05, Vladimir Ozerov <voze...@gridgain.com>:
>
>> Hi,
>>
>> I'd propose the following approach:
>> 1) Enable history by default. Becuase otherwise users will have to restart
>> the node to enable it, or we will have to implement dynamic history
>> enable,
>> which is complex thing. Default value should be relatively small yet
>> allowing to accommodate typical workloads. E.g. 1000 entries. This should
>> not put any serious pressure to GC.
>> 2) Split queries by: schema, query, local flag
>> 3) Track only growing values: execution count, error count, minimum
>> duration, maximum duration
>> 4) Implement ability to clear history - JMX, SQL command, whatever (may be
>> this is different ticket)
>> 5) History cleanup might be implemented similarly to current approach:
>> store everything in CHM. Periodically check it's size. If it is too big -
>> evict oldest entries. But this should be done with care - under some
>> workloads new queries will be generated very quickly. In this case we
>> should either fallback to synchronous evicts, or do not log history at
>> all.
>>
>> Thoughts?
>>
>> Vladimir.
>> -
>>
>> On Fri, Dec 21, 2018 at 11:22 AM Юрий <jury.gerzhedow...@gmail.com>
>> wrote:
>>
>> > Alexey,
>> >
>> > Yes, such property to configuration history size will be added. I think
>> > default value should be 0 and history by default shouldn't be gather at
>> > all, and can be switched on by property in case when it required.
>> >
>> > Currently I planned use the same way to evicting old data as for
>> > queryMetrics - scheduled task will evict will old data by oldest start
>> time
>> > of query.
>> >
>> > Will be gathered statistics for only initial clients queries, so
>> internal
>> > queries will not including. For the same queries we will have one
>> record in
>> > history with merged statistics.
>> >
>> > All above points just my proposal. Please revert back in case you think
>> > anything should be implemented by another way.
>> >
>> >
>> >
>> >
>> >
>> > чт, 20 дек. 2018 г. в 18:23, Alexey Kuznetsov <akuznet...@apache.org>:
>> >
>> > > Yuriy,
>> > >
>> > > I have several questions:
>> > >
>> > > Are we going to add some properties to cluster configuration for
>> history
>> > > size?
>> > >
>> > > And what will be default history size?
>> > >
>> > > Will the same queries count as same item of historical data?
>> > >
>> > > How we will evict old data that not fit into history?
>> > >
>> > > Will we somehow count "reduce" queries? Or only final "map" ones?
>> > >
>> > > --
>> > > Alexey Kuznetsov
>> > >
>> >
>> >
>> > --
>> > Живи с улыбкой! :D
>> >
>>
>
>
> --
> Живи с улыбкой! :D
>


-- 
Живи с улыбкой! :D

Re: Query history statistics API

Reply via email to