[
https://issues.apache.org/jira/browse/PHOENIX-4999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670240#comment-16670240
]
Bin Shi commented on PHOENIX-4999:
----------------------------------
[~karanmehta93], I was rethinking about this. *We don't need to disable "Update
statistics with tenant specific connection"*. Before any tenant user start to
use stats like "EXPLAN SELECT COUNT(*) ...", the user should have updated
statistics with tenant specific connection at least once, and most likely our
system has already run "UPDATE STATISTICS" on table level either by sql command
or by MR jobs. So *in real world, the case* "clean all stats and cache, then
update statistics for tenant 1, so tenant 2 has partial stats which only cover
1 row out of total 6 rows" defined in the E2E test case
testPartialStatsForTenantViews() in ExplainPlanWithStatsEnabledIT.java *will
never happen*.
Regarding resiliency, we only have one negative case caused by failures when
updating statistics from a region server, which can be get over by continuous
retries and is orthogonal to update statistic with tenant specific connection.
> Update statistics should not be allowed on tenant specific connection
> ---------------------------------------------------------------------
>
> Key: PHOENIX-4999
> URL: https://issues.apache.org/jira/browse/PHOENIX-4999
> Project: Phoenix
> Issue Type: Bug
> Reporter: Karan Mehta
> Assignee: Karan Mehta
> Priority: Major
>
> Update statistics sql would can trigger partial stats collection when ran
> using a tenant specific connection. Originally, update statistics internally
> runs scans on all the regions of table. TenantId field bounds the scans on
> startKey and endKey in tenant specific connection, which can cause stats to
> run only on specific regions and result in partial stats collection.
> Since the view data and table data reside in the same physical HBase table,
> it doesn't make sense to allow users to run stats for specific tenants as
> tenants may span across regions. The issue was first identified in
> PHOENIX-4333.
> The patch however doesn't fully stop the SQL from running. Multiple
> approaches can be taken here.
> # Unset the tenantId on the connection before update statistics is run and
> reset it back later. This can be tricky and bad to implement since tenantId
> is essentially a final field on PhoenixConnection.
> # As [~tdsilva] pointed out, we can throw an UnsupportedOperationException()
> whenever user tries to update statistics on tenant specific connection.
> The second option seems straightforward to implement and can prevent
> accidental usage of this sql.
> [~Bin Shi] [~sukumaddineni] Any thoughts here?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)