[jira] [Commented] (PHOENIX-1296) Scan entire region when tenant-specific table is analyzed

ramkrishna.s.vasudevan (JIRA) Fri, 26 Sep 2014 05:29:30 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14149090#comment-14149090
 ]


ramkrishna.s.vasudevan commented on PHOENIX-1296:
-------------------------------------------------

Anyway I have updated the following patch which makes guideposts to make it 
work with multi tenant scenario.

->Previously the tenenantID was not passed correctly for clearing the cache on 
ANALYZE 'table' is done.
->Inorder to make the entry to get updated we add an entry in the 
SYSTEM.CATALOG for the table with a ts +1 for the EMPTY_COLUMN (_0).
But in case of multitenant table this is not working correctly
{code}
[ZZTop\x00\x00TENANT_TABLE/0:COLUMN_COUNT/300/Put/vlen=4/mvcc=7, 
ZZTop\x00\x00TENANT_TABLE/0:DISABLE_WAL/300/Put/vlen=1/mvcc=7, 
ZZTop\x00\x00TENANT_TABLE/0:IMMUTABLE_ROWS/300/Put/vlen=1/mvcc=7, 
ZZTop\x00\x00TENANT_TABLE/0:MULTI_TENANT/300/Put/vlen=1/mvcc=7, 
ZZTop\x00\x00TENANT_TABLE/0:TABLE_SEQ_NUM/300/Put/vlen=8/mvcc=7, 
ZZTop\x00\x00TENANT_TABLE/0:TABLE_TYPE/300/Put/vlen=1/mvcc=7, 
ZZTop\x00\x00TENANT_TABLE/0:VIEW_STATEMENT/300/Put/vlen=57/mvcc=7, 
ZZTop\x00\x00TENANT_TABLE/0:VIEW_TYPE/300/Put/vlen=1/mvcc=7, 
ZZTop\x00\x00TENANT_TABLE/0:_0/301/Put/vlen=0/mvcc=10]
{code}

{code}
0:DEFAULT_COLUMN_FAMILY/LATEST_TIMESTAMP/Maximum/vlen=0/mvcc=0, 
/0:DISABLE_WAL/LATEST_TIMESTAMP/Maximum/vlen=0/mvcc=0, 
/0:IMMUTABLE_ROWS/LATEST_TIMESTAMP/Maximum/vlen=0/mvcc=0, 
/0:INDEX_DISABLE_TIMESTAMP/LATEST_TIMESTAMP/Maximum/vlen=0/mvcc=0, 
/0:INDEX_STATE/LATEST_TIMESTAMP/Maximum/vlen=0/mvcc=0, 
/0:INDEX_TYPE/LATEST_TIMESTAMP/Maximum/vlen=0/mvcc=0, 
/0:MULTI_TENANT/LATEST_TIMESTAMP/Maximum/vlen=0/mvcc=0, 
/0:PK_NAME/LATEST_TIMESTAMP/Maximum/vlen=0/mvcc=0, 
/0:SALT_BUCKETS/LATEST_TIMESTAMP/Maximum/vlen=0/mvcc=0, 
/0:TABLE_SEQ_NUM/LATEST_TIMESTAMP/Maximum/vlen=0/mvcc=0, 
/0:TABLE_TYPE/LATEST_TIMESTAMP/Maximum/vlen=0/mvcc=0, 
/0:VIEW_INDEX_ID/LATEST_TIMESTAMP/Maximum/vlen=0/mvcc=0, 
/0:VIEW_STATEMENT/LATEST_TIMESTAMP/Maximum/vlen=0/mvcc=0, 
/0:VIEW_TYPE/LATEST_TIMESTAMP/Maximum/vlen=0/mvcc=0]
{code}

So we skip the last cell in the result (_0) in this loop
{code}
        while (i < results.size() && j < TABLE_KV_COLUMNS.size()) {
            Cell kv = results.get(i);
            Cell searchKv = TABLE_KV_COLUMNS.get(j);
            int cmp =
                    Bytes.compareTo(kv.getQualifierArray(), 
kv.getQualifierOffset(),
                        kv.getQualifierLength(), searchKv.getQualifierArray(),
                        searchKv.getQualifierOffset(), 
searchKv.getQualifierLength());
            if (cmp == 0) {
                timeStamp = Math.max(timeStamp, kv.getTimestamp()); // Find max 
timestamp of table
                                                                    // header 
row
                tableKeyValues[j++] = kv;
                i++;
            } else if (cmp > 0) {
                timeStamp = Math.max(timeStamp, kv.getTimestamp()); 
                tableKeyValues[j++] = null;
            } else {
                i++; // shouldn't happen - means unexpected KV in system table 
header row
            }
        }
{code}

So in order to use that entry I have just added a check like this
{code}
        while (i < results.size()) {
            Cell kv = results.get(i);
            if (Bytes.compareTo(kv.getQualifierArray(), 
kv.getQualifierOffset(), kv.getQualifierLength(),
                    QueryConstants.EMPTY_COLUMN_BYTES, 0, 
QueryConstants.EMPTY_COLUMN_BYTES.length) == 0) {
                keyValue = kv;
                break;
            }
            i++;
        }
{code}


> Scan entire region when tenant-specific table is analyzed
> ---------------------------------------------------------
>
>                 Key: PHOENIX-1296
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1296
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: ramkrishna.s.vasudevan
>
> Based on the issue you've uncovered (that stats must be updated completely 
> for a region), there's a bit of follow on work needed if an ANALYZE is done 
> on a tenant-specific table. This case will be optimized to only scan and 
> analyze the current tenant's data, however we have to make sure that the 
> entire region(s) containing that tenant's data is scanned (or we'll end up 
> replacing the stats for that region with just the one we calculated for that 
> tenant).
> We should be able to do that based on ScanUtil.isAnalyzeTable(scan) being 
> true in DefaultParallelIteratorRegionSplitter and/or ParallelIterators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-1296) Scan entire region when tenant-specific table is analyzed

Reply via email to