[
https://issues.apache.org/jira/browse/PHOENIX-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657537#comment-16657537
]
Bin Shi commented on PHOENIX-4009:
----------------------------------
We mightn't need synchronization between MR jobs and "UPDATE STATISTICS ... "
sql commands (or even among Update Statistics MR jobs) by using SYSTEM.MUTEX or
distributed lock via ZooKeeper. The reasons are:
# There are so many failure scenarios and corner cases to handle.
# Once we use MR job to update statistics, using "UPDATE STATISTICS ..." sql
command to update statistics shouldn't be that frequent, so the chance of both
running simultaneously is low.
# Since updating SYSTEM.STATS table for a region happens atomically, it's ok
for them to run simultaneously.
# MR Job isn't running in region server's process space and its resource is
controlled by YARN, so it should be ok for them to run simultaneously.
> Run UPDATE STATISTICS command by using MR integration on snapshots
> ------------------------------------------------------------------
>
> Key: PHOENIX-4009
> URL: https://issues.apache.org/jira/browse/PHOENIX-4009
> Project: Phoenix
> Issue Type: Bug
> Reporter: Samarth Jain
> Priority: Major
>
> Now that we have the capability to run queries against table snapshots
> through our map reduce integration, we can utilize this capability for stats
> collection too. This would make our stats collection more resilient, resource
> aware and less resource intensive. The bulk of the plumbing is already in
> place. We would need to make sure that the integration doesn't barf when the
> query is an UPDATE STATISTICS command.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)