[
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15939402#comment-15939402
]
Sahil Takiar commented on HIVE-15396:
-------------------------------------
Thanks [~pxiong] for taking a look! I notice this behavior even when the
specified location is empty. What if I updated the patch so all stats are
collected only if the target location is empty? The use case is when running
Hive-on-S3. It's common practice to create managed Hive tables with a specified
location - e.g. {{CREATE TABLE s3_table (col int) LOCATION
's3a://[bucket-name]/s3_table/'}}
> Basic Stats are not collected when for managed tables with LOCATION specified
> -----------------------------------------------------------------------------
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
> Issue Type: Bug
> Components: Statistics
> Reporter: Sahil Takiar
> Assignee: Sahil Takiar
> Attachments: HIVE-15396.1.patch
>
>
> Basic stats are not collected when a managed table is created with a
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:10000> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:10000> describe formatted hdfs_1;
> +-------------------------------+----------------------------------------------------+-----------------------------+
> | col_name | data_type
> | comment |
> +-------------------------------+----------------------------------------------------+-----------------------------+
> | # col_name | data_type
> | comment |
> | | NULL
> | NULL |
> | col | int
> | |
> | | NULL
> | NULL |
> | # Detailed Table Information | NULL
> | NULL |
> | Database: | default
> | NULL |
> | Owner: | anonymous
> | NULL |
> | CreateTime: | Wed Mar 22 18:09:19 PDT 2017
> | NULL |
> | LastAccessTime: | UNKNOWN
> | NULL |
> | Retention: | 0
> | NULL |
> | Location: | file:/warehouse/hdfs_1 | NULL
> |
> | Table Type: | MANAGED_TABLE
> | NULL |
> | Table Parameters: | NULL
> | NULL |
> | | COLUMN_STATS_ACCURATE
> | {\"BASIC_STATS\":\"true\"} |
> | | numFiles
> | 0 |
> | | numRows
> | 0 |
> | | rawDataSize
> | 0 |
> | | totalSize
> | 0 |
> | | transient_lastDdlTime
> | 1490231359 |
> | | NULL
> | NULL |
> | # Storage Information | NULL
> | NULL |
> | SerDe Library: |
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL
> |
> | InputFormat: | org.apache.hadoop.mapred.TextInputFormat
> | NULL |
> | OutputFormat: |
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL
> |
> | Compressed: | No
> | NULL |
> | Num Buckets: | -1
> | NULL |
> | Bucket Columns: | []
> | NULL |
> | Sort Columns: | []
> | NULL |
> | Storage Desc Params: | NULL
> | NULL |
> | | serialization.format
> | 1 |
> +-------------------------------+----------------------------------------------------+-----------------------------+
> 0: jdbc:hive2://localhost:10000> create table s3_1 (col int) location
> 's3a://[bucket]/test-tables/s3-1';
> 0: jdbc:hive2://localhost:10000> describe formatted s3_1;
> +-------------------------------+----------------------------------------------------+-----------------------+
> | col_name | data_type
> | comment |
> +-------------------------------+----------------------------------------------------+-----------------------+
> | # col_name | data_type
> | comment |
> | | NULL
> | NULL |
> | col | int
> | |
> | | NULL
> | NULL |
> | # Detailed Table Information | NULL
> | NULL |
> | Database: | default
> | NULL |
> | Owner: | anonymous
> | NULL |
> | CreateTime: | Wed Mar 22 18:10:01 PDT 2017
> | NULL |
> | LastAccessTime: | UNKNOWN
> | NULL |
> | Retention: | 0
> | NULL |
> | Location: | s3a://[bucket]/test-tables/s3-1 | NULL
> |
> | Table Type: | MANAGED_TABLE
> | NULL |
> | Table Parameters: | NULL
> | NULL |
> | | transient_lastDdlTime
> | 1490231401 |
> | | NULL
> | NULL |
> | # Storage Information | NULL
> | NULL |
> | SerDe Library: |
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL |
> | InputFormat: | org.apache.hadoop.mapred.TextInputFormat
> | NULL |
> | OutputFormat: |
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL
> |
> | Compressed: | No
> | NULL |
> | Num Buckets: | -1
> | NULL |
> | Bucket Columns: | []
> | NULL |
> | Sort Columns: | []
> | NULL |
> | Storage Desc Params: | NULL
> | NULL |
> | | serialization.format
> | 1 |
> +-------------------------------+----------------------------------------------------+-----------------------+
> {code}
> There are no stats defined in the describe for the s3 table. Furthermore,
> when inserting into the s3 table the {{numRows}} stats are not collected for
> the s3 table.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)