Re: [DISCUSS] Design for a "query log"

Josh Elser Fri, 02 Mar 2018 14:07:46 -0800

My gut reaction would be to avoid storing queries over the system tablesas they would have more noise than value, but I think this is somethingthat could be entertained!


On 3/2/18 8:02 AM, Artem Ervits wrote:

+1


Is idea here to collect information on all types of queries even against
system tables? It would be nice to keep counts of how many times a query
was executed over time. In my mssql days we also had ability to check when
was the last time update stats ran and all kinds of impact information from
that.

On Mar 2, 2018 1:39 AM, "김영우 (YoungWoo Kim)" <yw...@apache.org> wrote:

Hi Josh,

Thanks for starting this discussion. Overall the design looks good to me!

Actually I have an internal implementation for logging user queries to text
file over PQS. Let me explain our use cases. We wanted to find out
following facts using our ugly hacks:
- Running queries against PQS
- Find out "WRONG" queries

On our systems and applications, I've realized that It's hard to control
unexpected large scans, full scans or running queries without indexes
against very large table on HBase/Phoenix. Sometimes users forgot to add
the "WHERE" clause for queries :-) And also we could not find an effective
way to stop or cancel the wrong queries immediately. Therefore, we should
find out the users and heavy queries from the query logs  and then we
noticed users about the impacts of their queries -- Please try this at
home, not production cluster :-)

Definitely it would be nice to have query logging out of the box! And if
the features are implemented like your proposal, we can replace an ugly
hacks with built-in feature from our cluster.

Thanks,
- Youngwoo



On Fri, Mar 2, 2018 at 10:05 AM, Josh Elser <els...@apache.org> wrote:

Any feedback from folks? Not sure if the silence should be interpreted
as ambivalence or plain old being busy :)

On Mon, Feb 26, 2018 at 4:57 PM, Josh Elser <els...@apache.org> wrote:

Hiya,

I wanted to share this little design doc with you about some feature

work

we've been thinking about. The following is a Google doc in which

anyone

should be allowed to comment. Feel free to comment there, or here on

the

thread.

https://s.apache.org/phoenix-query-log

The high-level goal is to create a construct in which Phoenix clients

will

automatically serialize information about the queries they run to a

table

for retrospective analysis. Ideally, this information would be stored

in

Phoenix table. We want this data to help answer questions like:

* What queries are running against my system
* What specific queries started between 535AM and 620AM two days ago
* What queries are user "bob" running
* Are my user's queries effectively using the indexes in the system

Anti-goals for include:

* Cluster impact (computation/memory) usage of a query
* Query performance may be slowed to ensure all data is serialized
* A third-party service dedicated to ensuring query info is serialized

(in

the event of client failure)

Take a look at the document and let us know what you think please. I'm

happy

to try to explain this in greater detail.

- Josh (on behalf of myself, Ankit, Rajeshbabu, and Sergey)

Re: [DISCUSS] Design for a "query log"

Reply via email to