[ https://issues.apache.org/jira/browse/PHOENIX-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14149618#comment-14149618 ]
James Taylor commented on PHOENIX-1296: --------------------------------------- Thanks for the patch, [~ramkrishna]. Much of the work you're doing here is more related to PHOENIX-1263. I'd attach this patch over there with some modifications and/or combine it with this one. Take a look there - I'll add a comment. Good catch on the empty key value not being processed by that loop. The easiest way to get that to happen (rather than add an extra loop) is to add the empty key value to our TABLE_KV_COLUMNS list here: {code} private static final KeyValue EMPTY_KEYVALUE_KV = KeyValue.createFirstOnRow( ByteUtil.EMPTY_BYTE_ARRAY, TABLE_FAMILY_BYTES, QueryConstants.EMPTY_COLUMN_BYTES); private static final List<KeyValue> TABLE_KV_COLUMNS = Arrays.<KeyValue>asList( EMPTY_KEYVALUE_KV, {code} As far as your question: bq. I don't think this we have to do. Consider there are two tenants that has the Tenant ID as AZ and SZ (for example). Yes, we do have to do this, but there's an easy way to do it (see below). First, why do we have to do it? Along the same lines as your example, let's say that both tenants AY and SY both live in the same region. Let's say we have the following guideposts for this region: AYd, AYm, SYg, SYt Now if we allow the ANALYZE to run only between [AY - AZ), then we'll recompute new guideposts only for this range: say AYc, AYj, AYm, AYv, and these would replace the guideposts for this region. The SYg and SYt guideposts would be inadvertently removed, because we didn't scan the entire region. The fix is pretty easy, though. On the server-side, in the UngroupedAggregateRegionObserver, in your if statement that detects an analyze is being done, just always set the scan start/stop row to HConstant.EMPTY_START_ROW and HConstant.EMPTY_END_ROW. This will force the scan to go over the entire region when an analyze is done, which is exactly what we want. > Scan entire region when tenant-specific table is analyzed > --------------------------------------------------------- > > Key: PHOENIX-1296 > URL: https://issues.apache.org/jira/browse/PHOENIX-1296 > Project: Phoenix > Issue Type: Sub-task > Reporter: James Taylor > Assignee: ramkrishna.s.vasudevan > Attachments: Phoenix-1296_1.patch > > > Based on the issue you've uncovered (that stats must be updated completely > for a region), there's a bit of follow on work needed if an ANALYZE is done > on a tenant-specific table. This case will be optimized to only scan and > analyze the current tenant's data, however we have to make sure that the > entire region(s) containing that tenant's data is scanned (or we'll end up > replacing the stats for that region with just the one we calculated for that > tenant). > We should be able to do that based on ScanUtil.isAnalyzeTable(scan) being > true in DefaultParallelIteratorRegionSplitter and/or ParallelIterators. -- This message was sent by Atlassian JIRA (v6.3.4#6332)