[ 
https://issues.apache.org/jira/browse/PHOENIX-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16710545#comment-16710545
 ] 

Karan Mehta commented on PHOENIX-4544:
--------------------------------------

[[email protected]] Referring to this code from MetaDataClient.java file, which 
you modified as a part of this patch. 
{code:java}
for (PTable index : table.getIndexes()) {
    // If the table is a view, then we will end up calling update stats
    // here for all the view indexes on it. We take care of local indexes later.
    if (index.getIndexType() != IndexType.LOCAL) {
        if (index.getIndexType() != IndexType.LOCAL) {
            if (table.getType() != PTableType.VIEW) {
                rowCount += updateStatisticsInternal(index.getPhysicalName(), 
index,
                        updateStatisticsStmt.getProps(), true);
            } else {
                rowCount += updateStatisticsInternal(table.getPhysicalName(), 
index,
                        updateStatisticsStmt.getProps(), true);
            }
        }
    }
}
{code}
I am trying to run {{UPDATE STATISTICS VIEW_1}}. The PTable for the view 
fetches the indexes.

The else condition gets triggered whenever a view is passed in. Thus it always 
leads to {{updateStatisticsInternal}} getting called with base table's names 
all the time as the first parameter. This parameter is used to lookup the last 
stats update time from SYSTEM.STATS table. The PTable also returns the list of 
global indexes that it can potentially use (which are separate HBase tables), 
however we see the same behavior for those.

This behavior seems inconsistent. Is this a bug? OR Are we supposed to not 
collect statistics on indexes for a view?

> Update statistics inconsistent behavior 
> ----------------------------------------
>
>                 Key: PHOENIX-4544
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4544
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 5.0.0
>            Reporter: Romil Choksi
>            Assignee: Ankit Singhal
>            Priority: Major
>             Fix For: 5.0.0, 4.14.1
>
>         Attachments: PHOENIX-4544.patch, PHOENIX-4544_v1.patch
>
>
> Update statistics may not generate the stats information for all dependent 
> indexes. And this behavior may depend on whether the command executed 
> synchronously or asynchronously.
> I have a table GIGANTIC_TABLE with ~500k rows with global index I1 and local 
> index I2.
> If async is turned on (the default value):
> {noformat}
> 0: jdbc:phoenix:> update statistics GIGANTIC_TABLE ALL;
> No rows affected (0.081 seconds)
> 0: jdbc:phoenix:> select count(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE 
> PHYSICAL_NAME='I1' AND COLUMN_FAMILY='0';
> +-------------------------------+
> | COUNT(GUIDE_POSTS_ROW_COUNT)  |
> +-------------------------------+
> | 5                             |
> +-------------------------------+
> 1 row selected (0.009 seconds)
> 0: jdbc:phoenix:> select count(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE 
> PHYSICAL_NAME='GIGANTIC_TABLE' AND COLUMN_FAMILY='0';
> +-------------------------------+
> | COUNT(GUIDE_POSTS_ROW_COUNT)  |
> +-------------------------------+
> | 520                           |
> +-------------------------------+
> 1 row selected (0.014 seconds)
> 0: jdbc:phoenix:> select count(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE 
> PHYSICAL_NAME='GIGANTIC_TABLE' AND COLUMN_FAMILY='L#0';
> +-------------------------------+
> | COUNT(GUIDE_POSTS_ROW_COUNT)  |
> +-------------------------------+
> | 0                             |
> +-------------------------------+
> 1 row selected (0.008 seconds)
> 0: jdbc:phoenix:>
> {noformat}
> As we can see there is no records for local index I2. But if we run 
> statistics for indexes:
> {noformat}
> 0: jdbc:phoenix:> update statistics GIGANTIC_TABLE INDEX;
> No rows affected (0.036 seconds)
> 0: jdbc:phoenix:> select count(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE 
> PHYSICAL_NAME='GIGANTIC_TABLE' AND COLUMN_FAMILY='L#0';
> +-------------------------------+
> | COUNT(GUIDE_POSTS_ROW_COUNT)  |
> +-------------------------------+
> | 20                            |
> +-------------------------------+
> 1 row selected (0.007 seconds)
> {noformat}
> the statistic for local index is generated correctly.
> Now we turn async off:
> {noformat}
> 0: jdbc:phoenix:> delete from SYSTEM.STATS;
> 547 rows affected (0.079 seconds)
> 0: jdbc:phoenix:> update statistics GIGANTIC_TABLE ALL;
> 999,998 rows affected (4.671 seconds)
> 0: jdbc:phoenix:> select count(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE 
> PHYSICAL_NAME='GIGANTIC_TABLE' AND COLUMN_FAMILY='0';
> +-------------------------------+
> | COUNT(GUIDE_POSTS_ROW_COUNT)  |
> +-------------------------------+
> | 520                           |
> +-------------------------------+
> 1 row selected (0.04 seconds)
> 0: jdbc:phoenix:> select count(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE 
> PHYSICAL_NAME='GIGANTIC_TABLE' AND COLUMN_FAMILY='L#0';
> +-------------------------------+
> | COUNT(GUIDE_POSTS_ROW_COUNT)  |
> +-------------------------------+
> | 20                            |
> +-------------------------------+
> 1 row selected (0.012 seconds)
> 0: jdbc:phoenix:> select count(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE 
> PHYSICAL_NAME='I1' AND COLUMN_FAMILY='0';
> +-------------------------------+
> | COUNT(GUIDE_POSTS_ROW_COUNT)  |
> +-------------------------------+
> | 0                             |
> +-------------------------------+
> 1 row selected (0.011 seconds)
> {noformat}
> As we can see we got statistics for the table itself and local index. But not 
> for the global index.
> Moreover, if we try to update statistics for indexes:
> {noformat}
> 0: jdbc:phoenix:> update statistics GIGANTIC_TABLE INDEX;
> 499,999 rows affected (0.332 seconds)
> 0: jdbc:phoenix:> select count(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE 
> PHYSICAL_NAME='I1' AND COLUMN_FAMILY='0';
> +-------------------------------+
> | COUNT(GUIDE_POSTS_ROW_COUNT)  |
> +-------------------------------+
> | 0                             |
> +-------------------------------+
> 1 row selected (0.009 seconds)
> {noformat}
> So, still no records for global index.
> But if we delete statistics first and run update for indexes:
> {noformat}
> 0: jdbc:phoenix:> delete from SYSTEM.STATS;
> 541 rows affected (0.024 seconds)
> 0: jdbc:phoenix:> update statistics GIGANTIC_TABLE INDEX;
> 999,998 rows affected (0.41 seconds)
> 0: jdbc:phoenix:> select count(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE 
> PHYSICAL_NAME='I1' AND COLUMN_FAMILY='0';
> +-------------------------------+
> | COUNT(GUIDE_POSTS_ROW_COUNT)  |
> +-------------------------------+
> | 5                             |
> +-------------------------------+
> 1 row selected (0.01 seconds)
> 0: jdbc:phoenix:> select count(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE 
> PHYSICAL_NAME='GIGANTIC_TABLE' AND COLUMN_FAMILY='L#0';
> +-------------------------------+
> | COUNT(GUIDE_POSTS_ROW_COUNT)  |
> +-------------------------------+
> | 20                            |
> +-------------------------------+
> 1 row selected (0.01 seconds)
> {noformat}
> than we got statistics for both local and global indexes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to