[ 
https://issues.apache.org/jira/browse/HIVE-12882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102630#comment-15102630
 ] 

Prasanth Jayachandran commented on HIVE-12882:
----------------------------------------------

RCFile supports the following and each of these are progressively slower
noscan - does not read files (total file count, total file size using hdfs 
APIs) (fastest)
partialscan - partially read files to get metadata (row count) (fast)
fullscan - read row-by-row to compute raw data size (slow)

In case of ORC, all these can be retrieved from orc file footer so there is 
only one mode which is fast. 

RCFile needs all 3 modes. 



> Automatically choose to use noscan for stats collection
> -------------------------------------------------------
>
>                 Key: HIVE-12882
>                 URL: https://issues.apache.org/jira/browse/HIVE-12882
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Pengcheng Xiong
>
> noscan is leveraging the file system to derive the #rows and rawDataSize. 
> According to [~ashutoshc], it now only works with RC and ORC file type. We 
> would like Hive to automatically choose to use noscan or scan based on the 
> file system when stats task starts or when user issues the same query 
> "Analyze ...."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to