[ https://issues.apache.org/jira/browse/PHOENIX-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16503711#comment-16503711 ]
Josh Elser commented on PHOENIX-4544: ------------------------------------- {quote}bq. And this behavior may depend on whether the command executed synchronously or asynchronously. {quote} What is controlling the asynchronous-ness? The {{ALL}} keyword opposed to the {{INDEX}} keyword? {code:java} + for (byte[] k : keys) { + if (contains(keys2, k)) { return true; } + } + return false;{code} This looks a little funny to me. You're only checking that one {{byte[]}} from the first set is contained in the second (not all {{byte[]}} 's)? > Update statistics inconsistent behavior > ---------------------------------------- > > Key: PHOENIX-4544 > URL: https://issues.apache.org/jira/browse/PHOENIX-4544 > Project: Phoenix > Issue Type: Bug > Affects Versions: 5.0.0 > Reporter: Romil Choksi > Assignee: Ankit Singhal > Priority: Major > Attachments: PHOENIX-4544.patch > > > Update statistics may not generate the stats information for all dependent > indexes. And this behavior may depend on whether the command executed > synchronously or asynchronously. > I have a table GIGANTIC_TABLE with ~500k rows with global index I1 and local > index I2. > If async is turned on (the default value): > {noformat} > 0: jdbc:phoenix:> update statistics GIGANTIC_TABLE ALL; > No rows affected (0.081 seconds) > 0: jdbc:phoenix:> select count(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE > PHYSICAL_NAME='I1' AND COLUMN_FAMILY='0'; > +-------------------------------+ > | COUNT(GUIDE_POSTS_ROW_COUNT) | > +-------------------------------+ > | 5 | > +-------------------------------+ > 1 row selected (0.009 seconds) > 0: jdbc:phoenix:> select count(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE > PHYSICAL_NAME='GIGANTIC_TABLE' AND COLUMN_FAMILY='0'; > +-------------------------------+ > | COUNT(GUIDE_POSTS_ROW_COUNT) | > +-------------------------------+ > | 520 | > +-------------------------------+ > 1 row selected (0.014 seconds) > 0: jdbc:phoenix:> select count(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE > PHYSICAL_NAME='GIGANTIC_TABLE' AND COLUMN_FAMILY='L#0'; > +-------------------------------+ > | COUNT(GUIDE_POSTS_ROW_COUNT) | > +-------------------------------+ > | 0 | > +-------------------------------+ > 1 row selected (0.008 seconds) > 0: jdbc:phoenix:> > {noformat} > As we can see there is no records for local index I2. But if we run > statistics for indexes: > {noformat} > 0: jdbc:phoenix:> update statistics GIGANTIC_TABLE INDEX; > No rows affected (0.036 seconds) > 0: jdbc:phoenix:> select count(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE > PHYSICAL_NAME='GIGANTIC_TABLE' AND COLUMN_FAMILY='L#0'; > +-------------------------------+ > | COUNT(GUIDE_POSTS_ROW_COUNT) | > +-------------------------------+ > | 20 | > +-------------------------------+ > 1 row selected (0.007 seconds) > {noformat} > the statistic for local index is generated correctly. > Now we turn async off: > {noformat} > 0: jdbc:phoenix:> delete from SYSTEM.STATS; > 547 rows affected (0.079 seconds) > 0: jdbc:phoenix:> update statistics GIGANTIC_TABLE ALL; > 999,998 rows affected (4.671 seconds) > 0: jdbc:phoenix:> select count(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE > PHYSICAL_NAME='GIGANTIC_TABLE' AND COLUMN_FAMILY='0'; > +-------------------------------+ > | COUNT(GUIDE_POSTS_ROW_COUNT) | > +-------------------------------+ > | 520 | > +-------------------------------+ > 1 row selected (0.04 seconds) > 0: jdbc:phoenix:> select count(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE > PHYSICAL_NAME='GIGANTIC_TABLE' AND COLUMN_FAMILY='L#0'; > +-------------------------------+ > | COUNT(GUIDE_POSTS_ROW_COUNT) | > +-------------------------------+ > | 20 | > +-------------------------------+ > 1 row selected (0.012 seconds) > 0: jdbc:phoenix:> select count(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE > PHYSICAL_NAME='I1' AND COLUMN_FAMILY='0'; > +-------------------------------+ > | COUNT(GUIDE_POSTS_ROW_COUNT) | > +-------------------------------+ > | 0 | > +-------------------------------+ > 1 row selected (0.011 seconds) > {noformat} > As we can see we got statistics for the table itself and local index. But not > for the global index. > Moreover, if we try to update statistics for indexes: > {noformat} > 0: jdbc:phoenix:> update statistics GIGANTIC_TABLE INDEX; > 499,999 rows affected (0.332 seconds) > 0: jdbc:phoenix:> select count(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE > PHYSICAL_NAME='I1' AND COLUMN_FAMILY='0'; > +-------------------------------+ > | COUNT(GUIDE_POSTS_ROW_COUNT) | > +-------------------------------+ > | 0 | > +-------------------------------+ > 1 row selected (0.009 seconds) > {noformat} > So, still no records for global index. > But if we delete statistics first and run update for indexes: > {noformat} > 0: jdbc:phoenix:> delete from SYSTEM.STATS; > 541 rows affected (0.024 seconds) > 0: jdbc:phoenix:> update statistics GIGANTIC_TABLE INDEX; > 999,998 rows affected (0.41 seconds) > 0: jdbc:phoenix:> select count(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE > PHYSICAL_NAME='I1' AND COLUMN_FAMILY='0'; > +-------------------------------+ > | COUNT(GUIDE_POSTS_ROW_COUNT) | > +-------------------------------+ > | 5 | > +-------------------------------+ > 1 row selected (0.01 seconds) > 0: jdbc:phoenix:> select count(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE > PHYSICAL_NAME='GIGANTIC_TABLE' AND COLUMN_FAMILY='L#0'; > +-------------------------------+ > | COUNT(GUIDE_POSTS_ROW_COUNT) | > +-------------------------------+ > | 20 | > +-------------------------------+ > 1 row selected (0.01 seconds) > {noformat} > than we got statistics for both local and global indexes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)