[
https://issues.apache.org/jira/browse/PHOENIX-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429814#comment-16429814
]
Andrew Purtell commented on PHOENIX-2715:
-----------------------------------------
Random thoughts on trying to use this in a production setting
* Cool to have a LogWriter that puts the log into a table, so the log itself
can be queried. Powerful. How about a LogWriter that just emits to Java logging
as well. HBase+Phoenix systems throw off a ton of this type of logging, so we
already need a solution for managing it, for which query log would just be a
new subset. Many places may want their log search solution to be based on
something else (Splunk, Elastic, Solr, etc.)
* If not an alternate implementation of LogWriter, at least a better
factoring. Make LogWriter abstract or an interface. That should be quickly
accomplished.
* What happens if query logging becomes too expensive? We can turn it all the
way on and all the way off. Can we have a knob for probabilistic sampling? This
is really easy to implement. Add one config parameter, a float or double, one
that can ideally be changed dynamically. Call it something like
QUERY_LOG_SAMPLE_RATE (not a great name but whatever) In the code where you go
to do the query logging, add a conditional \{{if
(ThreadLocalRandom.getCurrent().getDouble() <=
getConfig(QUERY_LOG_SAMPLE_RATE))}} . Easy. So if logging 100% of queries is
too expensive (at QUERY_LOG_SAMPLE_RATE = 1.0), we can try logging 50% of them
(at QUERY_LOG_SAMPLE_RATE = 0.5), or 10% of them (at QUERY_LOG_SAMPLE_RATE =
0.1), or 1% of them (at QUERY_LOG_SAMPLE_RATE = 0.01).
> Query Log
> ---------
>
> Key: PHOENIX-2715
> URL: https://issues.apache.org/jira/browse/PHOENIX-2715
> Project: Phoenix
> Issue Type: New Feature
> Reporter: Nick Dimiduk
> Assignee: Ankit Singhal
> Priority: Major
> Attachments: PHOENIX-2715.patch, PHOENIX-2715_master.patch,
> PHOENIX-2715_master_V1.patch
>
>
> One useful feature of other database systems is the query log. It allows the
> DBA to review the queries run, who's run them, time taken, &c. This serves
> both as an audit and also as a source of "ground truth" for performance
> optimization. For instance, which columns should be indexed. It may also
> serve as the foundation for automated performance recommendations/actions.
> What queries are being run is the first piece. Have this data tied into
> tracing results and perhaps client-side metrics (PHOENIX-1819) becomes very
> useful.
> This might take the form of clients writing data to a new system table, but
> other implementation suggestions are welcome.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)