Hi, I have question related to subject. How do you think should we track EXPLAIN queries also? I see reasons as to skip it and as include to history:
pros: We will have full picture and see all queries. cons: Such queries can be considered as investigate/debug/service queries and can push out real queries. What is your opinion? пт, 21 дек. 2018 г. в 19:23, Юрий <jury.gerzhedow...@gmail.com>: > Vladimir, thanks for your expert opinion. > > I have some thoughts about 5 point. > I tried to find how it works for Oracle and PG: > > *PG*: keep by default 1000 (can be configured) statements without and > discard the least-executed statements. Update statistics is asynchronous > process and statistics may have lag. > > *Oracle*: use shared pool for historical data and can evict records with > min time of last execution in case free space at shared pool is not enough > for a data which can be related not only historical statistics. So seems > also separate asynchronous process (information about it so small). > > > Unfortunately I could not find information about big workload and how it > handled for these databases. However We could see that both of vendors use > asynchronous statistic processing. > > > I see few variants how we can handle very high workload. > > First part of variants use asynchronous model with separate thread which > should take elements to update stats from a queue: > 1) We blocking on overlimited queue and wait when capacity will be enough > to put new element. > > + We have all actual statistics > - End of our query execution can be blocked. > > 2) Discard statistics for ended query in case queue is full. > > + Very fast for current query > - We lose part of statistics. > > 3) Do full clean of statistic's queue. > > + Fast and freespace for further elements > - We lose big number of statistic elements. > > > Second part of variants use current approach for queryMetrics. When we > have some additional capacity for CHM with history + periodical cleanup the > Map. In case even the additional space is not enough we can : > 1) Discard statistics for ended query > 2) Do full clean CHM and discard all gathered information. > > First part of variants potentially should work faster due to we can update > history Map in single thread without contention and put to queue should be > faster. > > > What do you think? Which of the variant will be prefer or may be you can > suggest another way to handle potential huge workload? > > Also there is one initial question which stay not clear to me - it is > right place for new API. > > > пт, 21 дек. 2018 г. в 13:05, Vladimir Ozerov <voze...@gridgain.com>: > >> Hi, >> >> I'd propose the following approach: >> 1) Enable history by default. Becuase otherwise users will have to restart >> the node to enable it, or we will have to implement dynamic history >> enable, >> which is complex thing. Default value should be relatively small yet >> allowing to accommodate typical workloads. E.g. 1000 entries. This should >> not put any serious pressure to GC. >> 2) Split queries by: schema, query, local flag >> 3) Track only growing values: execution count, error count, minimum >> duration, maximum duration >> 4) Implement ability to clear history - JMX, SQL command, whatever (may be >> this is different ticket) >> 5) History cleanup might be implemented similarly to current approach: >> store everything in CHM. Periodically check it's size. If it is too big - >> evict oldest entries. But this should be done with care - under some >> workloads new queries will be generated very quickly. In this case we >> should either fallback to synchronous evicts, or do not log history at >> all. >> >> Thoughts? >> >> Vladimir. >> - >> >> On Fri, Dec 21, 2018 at 11:22 AM Юрий <jury.gerzhedow...@gmail.com> >> wrote: >> >> > Alexey, >> > >> > Yes, such property to configuration history size will be added. I think >> > default value should be 0 and history by default shouldn't be gather at >> > all, and can be switched on by property in case when it required. >> > >> > Currently I planned use the same way to evicting old data as for >> > queryMetrics - scheduled task will evict will old data by oldest start >> time >> > of query. >> > >> > Will be gathered statistics for only initial clients queries, so >> internal >> > queries will not including. For the same queries we will have one >> record in >> > history with merged statistics. >> > >> > All above points just my proposal. Please revert back in case you think >> > anything should be implemented by another way. >> > >> > >> > >> > >> > >> > чт, 20 дек. 2018 г. в 18:23, Alexey Kuznetsov <akuznet...@apache.org>: >> > >> > > Yuriy, >> > > >> > > I have several questions: >> > > >> > > Are we going to add some properties to cluster configuration for >> history >> > > size? >> > > >> > > And what will be default history size? >> > > >> > > Will the same queries count as same item of historical data? >> > > >> > > How we will evict old data that not fit into history? >> > > >> > > Will we somehow count "reduce" queries? Or only final "map" ones? >> > > >> > > -- >> > > Alexey Kuznetsov >> > > >> > >> > >> > -- >> > Живи с улыбкой! :D >> > >> > > > -- > Живи с улыбкой! :D > -- Живи с улыбкой! :D