[ 
https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773677#comment-13773677
 ] 

Yin Huai commented on HIVE-4113:
--------------------------------

If a test query contains a query evaluated by multiple MR jobs, the 
corresponding golden file will need to be updated because all dummy 
TableScanOperators will appear in query plans. If we do not want this kind of 
updates right now, we can change 
GenMapRedUtils.createTemporaryTableScanOperator(RowSchema) to use 
{code}
TableScanOperator tableScanOp = (TableScanOperator) 
OperatorFactory.get(TableScanDesc.class, rowSchema);
{code}
instead of 
{code}
TableScanOperator tableScanOp = (TableScanOperator) OperatorFactory.get(new 
TableScanDesc(), rowSchema);
{code}
                
> Optimize select count(1) with RCFile and Orc
> --------------------------------------------
>
>                 Key: HIVE-4113
>                 URL: https://issues.apache.org/jira/browse/HIVE-4113
>             Project: Hive
>          Issue Type: Bug
>          Components: File Formats
>            Reporter: Gopal V
>            Assignee: Yin Huai
>             Fix For: 0.12.0
>
>         Attachments: HIVE-4113-0.patch, HIVE-4113.1.patch, HIVE-4113.2.patch, 
> HIVE-4113.3.patch, HIVE-4113.4.patch, HIVE-4113.5.patch, HIVE-4113.6.patch, 
> HIVE-4113.7.patch, HIVE-4113.8.patch, HIVE-4113.patch, HIVE-4113.patch
>
>
> select count(1) loads up every column & every row when used with RCFile.
> "select count(1) from store_sales_10_rc" gives
> {code}
> Job 0: Map: 5  Reduce: 1   Cumulative CPU: 31.73 sec   HDFS Read: 234914410 
> HDFS Write: 8 SUCCESS
> {code}
> Where as, "select count(ss_sold_date_sk) from store_sales_10_rc;" reads far 
> less
> {code}
> Job 0: Map: 5  Reduce: 1   Cumulative CPU: 29.75 sec   HDFS Read: 28145994 
> HDFS Write: 8 SUCCESS
> {code}
> Which is 11% of the data size read by the COUNT(1).
> This was tracked down to the following code in RCFile.java
> {code}
>       } else {
>         // TODO: if no column name is specified e.g, in select count(1) from 
> tt;
>         // skip all columns, this should be distinguished from the case:
>         // select * from tt;
>         for (int i = 0; i < skippedColIDs.length; i++) {
>           skippedColIDs[i] = false;
>         }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to