[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981840#comment-15981840
 ] 

Sahil Takiar commented on HIVE-15396:
-------------------------------------

[~pxiong] wanted to see if we can still get this patch in. Let me know what you 
think of the most recent patch. To summarize:

* The patch added basic stats collection for table with a {{LOCATION}} 
specified, but only if the specified location is empty and the table is not an 
external table
* This should be useful when running on blobstores such as S3, where users 
commonly specify an explicit {{LOCATION}} clause

Thanks for spending the time to look at this!

> Basic Stats are not collected when for managed tables with LOCATION specified
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-15396
>                 URL: https://issues.apache.org/jira/browse/HIVE-15396
>             Project: Hive
>          Issue Type: Bug
>          Components: Statistics
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>         Attachments: HIVE-15396.1.patch, HIVE-15396.2.patch, 
> HIVE-15396.3.patch, HIVE-15396.4.patch, HIVE-15396.5.patch, 
> HIVE-15396.6.patch, HIVE-15396.7.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:10000> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:10000> describe formatted hdfs_1;
> +-------------------------------+----------------------------------------------------+-----------------------------+
> |           col_name            |                     data_type               
>        |           comment           |
> +-------------------------------+----------------------------------------------------+-----------------------------+
> | # col_name                    | data_type                                   
>        | comment                     |
> |                               | NULL                                        
>        | NULL                        |
> | col                           | int                                         
>        |                             |
> |                               | NULL                                        
>        | NULL                        |
> | # Detailed Table Information  | NULL                                        
>        | NULL                        |
> | Database:                     | default                                     
>        | NULL                        |
> | Owner:                        | anonymous                                   
>        | NULL                        |
> | CreateTime:                   | Wed Mar 22 18:09:19 PDT 2017                
>        | NULL                        |
> | LastAccessTime:               | UNKNOWN                                     
>        | NULL                        |
> | Retention:                    | 0                                           
>        | NULL                        |
> | Location:                     | file:/warehouse/hdfs_1 | NULL               
>          |
> | Table Type:                   | MANAGED_TABLE                               
>        | NULL                        |
> | Table Parameters:             | NULL                                        
>        | NULL                        |
> |                               | COLUMN_STATS_ACCURATE                       
>        | {\"BASIC_STATS\":\"true\"}  |
> |                               | numFiles                                    
>        | 0                           |
> |                               | numRows                                     
>        | 0                           |
> |                               | rawDataSize                                 
>        | 0                           |
> |                               | totalSize                                   
>        | 0                           |
> |                               | transient_lastDdlTime                       
>        | 1490231359                  |
> |                               | NULL                                        
>        | NULL                        |
> | # Storage Information         | NULL                                        
>        | NULL                        |
> | SerDe Library:                | 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL                     
>    |
> | InputFormat:                  | org.apache.hadoop.mapred.TextInputFormat    
>        | NULL                        |
> | OutputFormat:                 | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL             
>            |
> | Compressed:                   | No                                          
>        | NULL                        |
> | Num Buckets:                  | -1                                          
>        | NULL                        |
> | Bucket Columns:               | []                                          
>        | NULL                        |
> | Sort Columns:                 | []                                          
>        | NULL                        |
> | Storage Desc Params:          | NULL                                        
>        | NULL                        |
> |                               | serialization.format                        
>        | 1                           |
> +-------------------------------+----------------------------------------------------+-----------------------------+
> 0: jdbc:hive2://localhost:10000> create table s3_1 (col int) location 
> 's3a://[bucket]/test-tables/s3-1';
> 0: jdbc:hive2://localhost:10000> describe formatted s3_1;
> +-------------------------------+----------------------------------------------------+-----------------------+
> |           col_name            |                     data_type               
>        |        comment        |
> +-------------------------------+----------------------------------------------------+-----------------------+
> | # col_name                    | data_type                                   
>        | comment               |
> |                               | NULL                                        
>        | NULL                  |
> | col                           | int                                         
>        |                       |
> |                               | NULL                                        
>        | NULL                  |
> | # Detailed Table Information  | NULL                                        
>        | NULL                  |
> | Database:                     | default                                     
>        | NULL                  |
> | Owner:                        | anonymous                                   
>        | NULL                  |
> | CreateTime:                   | Wed Mar 22 18:10:01 PDT 2017                
>        | NULL                  |
> | LastAccessTime:               | UNKNOWN                                     
>        | NULL                  |
> | Retention:                    | 0                                           
>        | NULL                  |
> | Location:                     | s3a://[bucket]/test-tables/s3-1     | NULL  
>                 |
> | Table Type:                   | MANAGED_TABLE                               
>        | NULL                  |
> | Table Parameters:             | NULL                                        
>        | NULL                  |
> |                               | transient_lastDdlTime                       
>        | 1490231401            |
> |                               | NULL                                        
>        | NULL                  |
> | # Storage Information         | NULL                                        
>        | NULL                  |
> | SerDe Library:                | 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL                  |
> | InputFormat:                  | org.apache.hadoop.mapred.TextInputFormat    
>        | NULL                  |
> | OutputFormat:                 | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL             
>      |
> | Compressed:                   | No                                          
>        | NULL                  |
> | Num Buckets:                  | -1                                          
>        | NULL                  |
> | Bucket Columns:               | []                                          
>        | NULL                  |
> | Sort Columns:                 | []                                          
>        | NULL                  |
> | Storage Desc Params:          | NULL                                        
>        | NULL                  |
> |                               | serialization.format                        
>        | 1                     |
> +-------------------------------+----------------------------------------------------+-----------------------+
> {code}
> There are no stats defined in the describe for the s3 table. Furthermore, 
> when inserting into the s3 table the {{numRows}} stats are not collected for 
> the s3 table.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to