[ https://issues.apache.org/jira/browse/HIVE-8433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14170361#comment-14170361 ]
Sergey Shelukhin commented on HIVE-8433: ---------------------------------------- The schema is incorrect. Looks like the schema check is messed up. With order by and adding extra column (e.g. key) to select list of the query, fixTopOB detects schema mismatch and CBO fails. Without order by, schema mismatch check is never even performed and the query, as described, coincidentally happens to produce a correct plan and result. However, if extra column (that causes a mismatch) is also the only order by column, fixTopOB removes it from projection (assuming that it is there for order by, I guess?), and the number of columns in projection and (incorrect) schema just happens to match, so incorrect result is produced. > CBO loses a column during AST conversion > ---------------------------------------- > > Key: HIVE-8433 > URL: https://issues.apache.org/jira/browse/HIVE-8433 > Project: Hive > Issue Type: Bug > Components: CBO > Reporter: Sergey Shelukhin > Assignee: Sergey Shelukhin > Priority: Critical > > {noformat} > SELECT > CAST(value AS BINARY), > value > FROM src > ORDER BY value > LIMIT 100 > {noformat} > returns only one column. > Final CBO plan is > {noformat} > HiveSortRel(sort0=[$1], dir0=[ASC]): rowcount = 500.0, cumulative cost = > {24858.432393688767 rows, 500.0 cpu, 0.0 io}, id = 44 > HiveProjectRel(value=[CAST($0):BINARY(2147483647) NOT NULL], > value1=[$0]): rowcount = 500.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 > io}, id = 42 > HiveProjectRel(value=[$1]): rowcount = 500.0, cumulative cost = {0.0 > rows, 0.0 cpu, 0.0 io}, id = 40 > HiveTableScanRel(table=[[default.src]]): rowcount = 500.0, cumulative > cost = {0}, id = 0 > {noformat} > but the resulting AST has only one column. Must be some bug in conversion, > probably related to the name collision in the schema, judging by the alias of > the column for the binary-cast value in the AST > {noformat} > TOK_QUERY > TOK_FROM > TOK_SUBQUERY > TOK_QUERY > TOK_FROM > TOK_TABREF > TOK_TABNAME > default > src > src > TOK_INSERT > TOK_DESTINATION > TOK_DIR > TOK_TMP_FILE > TOK_SELECT > TOK_SELEXPR > . > TOK_TABLE_OR_COL > src > value > value > $hdt$_0 > TOK_INSERT > TOK_DESTINATION > TOK_DIR > TOK_TMP_FILE > TOK_SELECT > TOK_SELEXPR > TOK_FUNCTION > TOK_BINARY > . > TOK_TABLE_OR_COL > $hdt$_0 > value > value > TOK_ORDERBY > TOK_TABSORTCOLNAMEASC > TOK_TABLE_OR_COL > value > TOK_LIMIT > 100 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)