[ 
https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773134#comment-13773134
 ] 

Ashutosh Chauhan commented on HIVE-4113:
----------------------------------------

It seems instead of null check more elegant fix is TableScanOp always contain 
list of columns it wants to read, even for subsequent MR jobs. Not sure though 
how easy it is to fix it, probably will require changes in query plannar. Yin, 
can you take a quick look if its easy to fix that away. If it turns out to be 
quite a bit of work, we can do that in follow-up too.
                
> Optimize select count(1) with RCFile and Orc
> --------------------------------------------
>
>                 Key: HIVE-4113
>                 URL: https://issues.apache.org/jira/browse/HIVE-4113
>             Project: Hive
>          Issue Type: Bug
>          Components: File Formats
>            Reporter: Gopal V
>            Assignee: Yin Huai
>             Fix For: 0.12.0
>
>         Attachments: HIVE-4113-0.patch, HIVE-4113.1.patch, HIVE-4113.2.patch, 
> HIVE-4113.3.patch, HIVE-4113.4.patch, HIVE-4113.5.patch, HIVE-4113.6.patch, 
> HIVE-4113.patch, HIVE-4113.patch
>
>
> select count(1) loads up every column & every row when used with RCFile.
> "select count(1) from store_sales_10_rc" gives
> {code}
> Job 0: Map: 5  Reduce: 1   Cumulative CPU: 31.73 sec   HDFS Read: 234914410 
> HDFS Write: 8 SUCCESS
> {code}
> Where as, "select count(ss_sold_date_sk) from store_sales_10_rc;" reads far 
> less
> {code}
> Job 0: Map: 5  Reduce: 1   Cumulative CPU: 29.75 sec   HDFS Read: 28145994 
> HDFS Write: 8 SUCCESS
> {code}
> Which is 11% of the data size read by the COUNT(1).
> This was tracked down to the following code in RCFile.java
> {code}
>       } else {
>         // TODO: if no column name is specified e.g, in select count(1) from 
> tt;
>         // skip all columns, this should be distinguished from the case:
>         // select * from tt;
>         for (int i = 0; i < skippedColIDs.length; i++) {
>           skippedColIDs[i] = false;
>         }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to