[jira] [Commented] (PHOENIX-4009) Run UPDATE STATISTICS command by using MR integration on snapshots

Karan Mehta (JIRA) Thu, 18 Oct 2018 18:06:56 -0700


    [ 
https://issues.apache.org/jira/browse/PHOENIX-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16656116#comment-16656116
 ]


Karan Mehta commented on PHOENIX-4009:
--------------------------------------

This is definitely a better way of collecting statistics to reduce the load on 
HRegionServer process.

Following are the concerns here.
 # Currently Phoenix MR framework only allows running against non-aggregate 
queries. {{UPDATE STATISTICS}} is a mutation statement, although the internal 
implementation boils down to running {{SELECT COUNT(*) FROM TABLENAME}} to get 
the parallel scans, which has a special attribute {{_ANALYZETABLE}} set to it. 
Although this is an aggregation, it should be okay to turn this logic to 
{{SELECT * FROM TABLENAME}} and check the required implications.
 # Statistics updation can be issued by multiple clients or applications at the 
same time or it can be instantiated via major compaction. At this point we 
prevent multiple instances of statistics running by using a HRegionServer level 
singleton class that maintains a list of regions that are collecting 
statistics. Updating SYSTEM.STATS table for a region happens atomically. A 
potential solution is to use SYSTEM.MUTEX table to acquire time based lock on 
the region that is collecting stats so that various apps can check before 
proceeding. If the process dies in between, then lock will have a TTL and will 
automatically get removed. The only potential concern here is if it collection 
takes more time than TTL. We can set TTL to a decent value (like 1 hour) so 
that we potentially never run into this case.

I am still exploring more on this and will keep the thoughts posted here.

FYI [~Bin Shi] [~sukumaddineni]

> Run UPDATE STATISTICS command by using MR integration on snapshots
> ------------------------------------------------------------------
>
>                 Key: PHOENIX-4009
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4009
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Samarth Jain
>            Priority: Major
>
> Now that we have the capability to run queries against table snapshots 
> through our map reduce integration, we can utilize this capability for stats 
> collection too. This would make our stats collection more resilient, resource 
> aware and less resource intensive. The bulk of the plumbing is already in 
> place. We would need to make sure that the integration doesn't barf when the 
> query is an UPDATE STATISTICS command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (PHOENIX-4009) Run UPDATE STATISTICS command by using MR integration on snapshots

Reply via email to