[ https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773677#comment-13773677 ]
Yin Huai commented on HIVE-4113: -------------------------------- If a test query contains a query evaluated by multiple MR jobs, the corresponding golden file will need to be updated because all dummy TableScanOperators will appear in query plans. If we do not want this kind of updates right now, we can change GenMapRedUtils.createTemporaryTableScanOperator(RowSchema) to use {code} TableScanOperator tableScanOp = (TableScanOperator) OperatorFactory.get(TableScanDesc.class, rowSchema); {code} instead of {code} TableScanOperator tableScanOp = (TableScanOperator) OperatorFactory.get(new TableScanDesc(), rowSchema); {code} > Optimize select count(1) with RCFile and Orc > -------------------------------------------- > > Key: HIVE-4113 > URL: https://issues.apache.org/jira/browse/HIVE-4113 > Project: Hive > Issue Type: Bug > Components: File Formats > Reporter: Gopal V > Assignee: Yin Huai > Fix For: 0.12.0 > > Attachments: HIVE-4113-0.patch, HIVE-4113.1.patch, HIVE-4113.2.patch, > HIVE-4113.3.patch, HIVE-4113.4.patch, HIVE-4113.5.patch, HIVE-4113.6.patch, > HIVE-4113.7.patch, HIVE-4113.8.patch, HIVE-4113.patch, HIVE-4113.patch > > > select count(1) loads up every column & every row when used with RCFile. > "select count(1) from store_sales_10_rc" gives > {code} > Job 0: Map: 5 Reduce: 1 Cumulative CPU: 31.73 sec HDFS Read: 234914410 > HDFS Write: 8 SUCCESS > {code} > Where as, "select count(ss_sold_date_sk) from store_sales_10_rc;" reads far > less > {code} > Job 0: Map: 5 Reduce: 1 Cumulative CPU: 29.75 sec HDFS Read: 28145994 > HDFS Write: 8 SUCCESS > {code} > Which is 11% of the data size read by the COUNT(1). > This was tracked down to the following code in RCFile.java > {code} > } else { > // TODO: if no column name is specified e.g, in select count(1) from > tt; > // skip all columns, this should be distinguished from the case: > // select * from tt; > for (int i = 0; i < skippedColIDs.length; i++) { > skippedColIDs[i] = false; > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira