[ 
https://issues.apache.org/jira/browse/PHOENIX-4999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16668156#comment-16668156
 ] 

Karan Mehta commented on PHOENIX-4999:
--------------------------------------

Yes [~Bin Shi], Agreed with all that. These optimizations are specifically 
related to multi-tenant tables. However, we currently don't have any logic in 
the phoenix-client for all these scenarios and optimizations. Also, in general 
it is better to rely less on client side to keep track of data drifts and other 
data related info.

I also agree that update stats via MR or SQL can also result in partial stats 
if failures occur. However the emphasis of the Jira is to try to eliminate one 
possibility of when this can happen. Till the time we refactor the code to make 
these optimizations pluggable (and implement them for multi-tenant tables 
specifically), wae can use this Jira to ensure that users are not allowed to 
run such statements. PHOENIX-4333 guards the client against missing/partial 
statistics but I believe this is a simple approach to prevent this happening in 
the first place. Any thoughts [~Bin Shi]?

> Update statistics should not be allowed on tenant specific connection
> ---------------------------------------------------------------------
>
>                 Key: PHOENIX-4999
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4999
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Karan Mehta
>            Assignee: Karan Mehta
>            Priority: Major
>
> Update statistics sql would can trigger partial stats collection when ran 
> using a tenant specific connection. Originally, update statistics internally 
> runs scans on all the regions of table. TenantId field bounds the scans on 
> startKey and endKey in tenant specific connection, which can cause stats to 
> run only on specific regions and result in partial stats collection. 
> Since the view data and table data reside in the same physical HBase table, 
> it doesn't make sense to allow users to run stats for specific tenants as 
> tenants may span across regions. The issue was first identified in 
> PHOENIX-4333.
> The patch however doesn't fully stop the SQL from running. Multiple 
> approaches can be taken here. 
>  # Unset the tenantId on the connection before update statistics is run and 
> reset it back later. This can be tricky and bad to implement since tenantId 
> is essentially a final field on PhoenixConnection.
>  # As [~tdsilva] pointed out, we can throw an UnsupportedOperationException() 
> whenever user tries to update statistics on tenant specific connection.
> The second option seems straightforward to implement and can prevent 
> accidental usage of this sql.
> [~Bin Shi] [~sukumaddineni] Any thoughts here?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to