[ 
https://issues.apache.org/jira/browse/PHOENIX-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14239017#comment-14239017
 ] 

Jan Fernando commented on PHOENIX-1452:
---------------------------------------

I changed to summary to better reflect the primary goal I wanted to address 
which was to provide more insight into client-side Phoenix behavior. 
[~apurtell] since I intended this to focus on client-side metrics we can't 
necessarily leverage the HBase time series metrics infrastructure. My thought 
was to be able to provide client-side metrics that were agnostic about which 
Time Series/Graphing/Metrics solution was used and provided a way for callers 
to access these metrics at a given interval and then reset the counters and 
then ship them to whatever metrics system they were using for capturing this 
kind of operational data (e.g. Graphite, Cacti, Ganglia etc.)

Thinking about this some more I think there are two distinct themes here:
1) A single summary log-line per query that logs some summary information about 
the query.
2) Capturing global client metrics that clients of Phoenix can get from the 
PhoenixRuntime

Spelling out some initial thoughts on each of these

#1 Summary Loglines
This data would be interesting to capture per SQL statement:
Execution Time
Number of parallel tasks submitted
Number of HBase scans invoked
Number of Batches of data committed to HBase
Boolean of whether data was spooled to disk, 
Amount of heap allocated to process the SQL
Number of results returned/rows upserted

#2 Global metrics
The following would be useful to track and make available to be clients to 
extract into their metrics system of choice at a configurable interval:
Number of Spool Files Written to Disk 
Maximum Number of threads from thread pool used 
Histogram of Number of threads from thread pool used (configurable percentiles)
Histogram of wait times of queued tasks (configurable percentiles)
Min, max, average percentage of max memory allocated by the GlobalMemoryManager 
during a configurable interval
Histogram of # number of tasks
Number of HBase Client Scans invoked
Number of HBase Client Puts invoked
Number of HBase Client Gets invoked

 [~jamestaylor] Do you think I should dupe out 
https://issues.apache.org/jira/browse/PHOENIX-1120 to this JIRA and break these 
ideas into subtasks?

> Add Phoenix client-side logging and capture resource utilization metrics
> ------------------------------------------------------------------------
>
>                 Key: PHOENIX-1452
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1452
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 5.0.0, 4.2
>            Reporter: Jan Fernando
>
> For performance testing and tuning of features that use Phoenix and for 
> production monitoring it would be really helpful to easily be able to extract 
> statistics about Phoenix's client-side Thread Pool and Queue Depth usage to 
> help with tuning and being able to correlate the impact of tuning these 2 
> parameters to query performance.
> For global per JVM logging one of the following would meet my needs, with a 
> preference for #2:
> 1. A simple log line that that logs the data in ThreadPoolExecutor.toString() 
> at a configurable interval
> 2. Exposing the ThreadPoolExecutor metrics in PhoenixRuntime or other global 
> client exposed class and allow client to do their own logging.
> In addition to this it would also be really valuable to have a single log 
> line per query that provides statistics about the level of parallelism i.e. 
> number of parallel scans being executed. I don't full explain plan level of 
> data but a good heuristic to be able to track over time how queries are 
> utilizing the thread pool as data size grows etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to