[
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sahil Takiar updated HIVE-15396:
--------------------------------
Description:
Basic stats are not collected when a managed table is created with a specified
{{LOCATION}} clause.
{code}
0: jdbc:hive2://localhost:10000> create table hdfs_1 (col int);
0: jdbc:hive2://localhost:10000> describe formatted hdfs_1;
+-------------------------------+----------------------------------------------------+-----------------------------+
| col_name | data_type
| comment |
+-------------------------------+----------------------------------------------------+-----------------------------+
| # col_name | data_type
| comment |
| | NULL
| NULL |
| col | int
| |
| | NULL
| NULL |
| # Detailed Table Information | NULL
| NULL |
| Database: | default
| NULL |
| Owner: | anonymous
| NULL |
| CreateTime: | Wed Mar 22 18:09:19 PDT 2017
| NULL |
| LastAccessTime: | UNKNOWN
| NULL |
| Retention: | 0
| NULL |
| Location: |
file:/Users/stakiar/Documents/idea/apache-hive/warehouse/hdfs_2 | NULL
|
| Table Type: | MANAGED_TABLE
| NULL |
| Table Parameters: | NULL
| NULL |
| | COLUMN_STATS_ACCURATE
| {\"BASIC_STATS\":\"true\"} |
| | numFiles
| 0 |
| | numRows
| 0 |
| | rawDataSize
| 0 |
| | totalSize
| 0 |
| | transient_lastDdlTime
| 1490231359 |
| | NULL
| NULL |
| # Storage Information | NULL
| NULL |
| SerDe Library: |
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL
|
| InputFormat: | org.apache.hadoop.mapred.TextInputFormat
| NULL |
| OutputFormat: |
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL
|
| Compressed: | No
| NULL |
| Num Buckets: | -1
| NULL |
| Bucket Columns: | []
| NULL |
| Sort Columns: | []
| NULL |
| Storage Desc Params: | NULL
| NULL |
| | serialization.format
| 1 |
+-------------------------------+----------------------------------------------------+-----------------------------+
0: jdbc:hive2://localhost:10000> create table s3_1 (col int) location
's3a://[bucket]/test-tables/s3-1';
0: jdbc:hive2://localhost:10000> describe formatted s3_1;
+-------------------------------+----------------------------------------------------+-----------------------+
| col_name | data_type
| comment |
+-------------------------------+----------------------------------------------------+-----------------------+
| # col_name | data_type
| comment |
| | NULL
| NULL |
| col | int
| |
| | NULL
| NULL |
| # Detailed Table Information | NULL
| NULL |
| Database: | default
| NULL |
| Owner: | anonymous
| NULL |
| CreateTime: | Wed Mar 22 18:10:01 PDT 2017
| NULL |
| LastAccessTime: | UNKNOWN
| NULL |
| Retention: | 0
| NULL |
| Location: |
s3a://cloudera-dev-hive-on-s3/test-tables/s3-6 | NULL |
| Table Type: | MANAGED_TABLE
| NULL |
| Table Parameters: | NULL
| NULL |
| | transient_lastDdlTime
| 1490231401 |
| | NULL
| NULL |
| # Storage Information | NULL
| NULL |
| SerDe Library: |
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL |
| InputFormat: | org.apache.hadoop.mapred.TextInputFormat
| NULL |
| OutputFormat: |
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL
|
| Compressed: | No
| NULL |
| Num Buckets: | -1
| NULL |
| Bucket Columns: | []
| NULL |
| Sort Columns: | []
| NULL |
| Storage Desc Params: | NULL
| NULL |
| | serialization.format
| 1 |
+-------------------------------+----------------------------------------------------+-----------------------+
{code}
was:
{{numRows}} is not collected when running {{INSERT ... INTO ...}} commands
against tables backed by S3 (and maybe even other blobstores).
The COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"} entry is missing from the
{{describe extended}} output.
Repro steps:
{code}
hive> drop table s3_table;
OK
Time taken: 1.87 seconds
hive> create table s3_table (col int) location
's3a://[bucket-name]/stats-test/';
OK
Time taken: 3.069 seconds
hive> insert into s3_table values (1), (2), (3);
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the
future versions. Consider using a different execution engine (i.e. spark, tez)
or using Hive 1.X releases.
Query ID = stakiar_20161208160105_fb3df340-d5fb-4ad6-8776-4f3cae02216d
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2016-12-08 16:01:12,741 Stage-1 map = 0%, reduce = 0%
2016-12-08 16:01:16,759 Stage-1 map = 100%, reduce = 0%
Ended Job = job_local688636529_0004
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Loading data to table default.s3_table
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 23.0 seconds
hive> select * from s3_table;
OK
1
2
3
Time taken: 0.096 seconds, Fetched: 3 row(s)
hive> describe extended s3_table;
OK
col int
Detailed Table Information Table(tableName:s3_table, dbName:default,
owner:stakiar, createTime:1481241657, lastAccessTime:0, retention:0,
sd:StorageDescriptor(cols:[FieldSchema(name:col, type:int, comment:null)],
location:s3a://[bucket-name]/stats-test,
inputFormat:org.apache.hadoop.mapred.TextInputFormat,
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
parameters:{serialization.format=1}), bucketCols:[], sortCols:[],
parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[],
skewedColValueLocationMaps:{}), storedAsSubDirectories:false),
partitionKeys:[], parameters:{transient_lastDdlTime=1481241687, totalSize=6,
numFiles=1}, viewOriginalText:null, viewExpandedText:null,
tableType:MANAGED_TABLE)
Time taken: 0.037 seconds, Fetched: 3 row(s)
{code}
> Basic Stats are not collected when for managed tables with LOCATION specified
> -----------------------------------------------------------------------------
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Reporter: Sahil Takiar
> Assignee: Sahil Takiar
> Attachments: HIVE-15396.1.patch
>
>
> Basic stats are not collected when a managed table is created with a
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:10000> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:10000> describe formatted hdfs_1;
> +-------------------------------+----------------------------------------------------+-----------------------------+
> | col_name | data_type
> | comment |
> +-------------------------------+----------------------------------------------------+-----------------------------+
> | # col_name | data_type
> | comment |
> | | NULL
> | NULL |
> | col | int
> | |
> | | NULL
> | NULL |
> | # Detailed Table Information | NULL
> | NULL |
> | Database: | default
> | NULL |
> | Owner: | anonymous
> | NULL |
> | CreateTime: | Wed Mar 22 18:09:19 PDT 2017
> | NULL |
> | LastAccessTime: | UNKNOWN
> | NULL |
> | Retention: | 0
> | NULL |
> | Location: |
> file:/Users/stakiar/Documents/idea/apache-hive/warehouse/hdfs_2 | NULL
> |
> | Table Type: | MANAGED_TABLE
> | NULL |
> | Table Parameters: | NULL
> | NULL |
> | | COLUMN_STATS_ACCURATE
> | {\"BASIC_STATS\":\"true\"} |
> | | numFiles
> | 0 |
> | | numRows
> | 0 |
> | | rawDataSize
> | 0 |
> | | totalSize
> | 0 |
> | | transient_lastDdlTime
> | 1490231359 |
> | | NULL
> | NULL |
> | # Storage Information | NULL
> | NULL |
> | SerDe Library: |
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL
> |
> | InputFormat: | org.apache.hadoop.mapred.TextInputFormat
> | NULL |
> | OutputFormat: |
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL
> |
> | Compressed: | No
> | NULL |
> | Num Buckets: | -1
> | NULL |
> | Bucket Columns: | []
> | NULL |
> | Sort Columns: | []
> | NULL |
> | Storage Desc Params: | NULL
> | NULL |
> | | serialization.format
> | 1 |
> +-------------------------------+----------------------------------------------------+-----------------------------+
> 0: jdbc:hive2://localhost:10000> create table s3_1 (col int) location
> 's3a://[bucket]/test-tables/s3-1';
> 0: jdbc:hive2://localhost:10000> describe formatted s3_1;
> +-------------------------------+----------------------------------------------------+-----------------------+
> | col_name | data_type
> | comment |
> +-------------------------------+----------------------------------------------------+-----------------------+
> | # col_name | data_type
> | comment |
> | | NULL
> | NULL |
> | col | int
> | |
> | | NULL
> | NULL |
> | # Detailed Table Information | NULL
> | NULL |
> | Database: | default
> | NULL |
> | Owner: | anonymous
> | NULL |
> | CreateTime: | Wed Mar 22 18:10:01 PDT 2017
> | NULL |
> | LastAccessTime: | UNKNOWN
> | NULL |
> | Retention: | 0
> | NULL |
> | Location: |
> s3a://cloudera-dev-hive-on-s3/test-tables/s3-6 | NULL |
> | Table Type: | MANAGED_TABLE
> | NULL |
> | Table Parameters: | NULL
> | NULL |
> | | transient_lastDdlTime
> | 1490231401 |
> | | NULL
> | NULL |
> | # Storage Information | NULL
> | NULL |
> | SerDe Library: |
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL |
> | InputFormat: | org.apache.hadoop.mapred.TextInputFormat
> | NULL |
> | OutputFormat: |
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL
> |
> | Compressed: | No
> | NULL |
> | Num Buckets: | -1
> | NULL |
> | Bucket Columns: | []
> | NULL |
> | Sort Columns: | []
> | NULL |
> | Storage Desc Params: | NULL
> | NULL |
> | | serialization.format
> | 1 |
> +-------------------------------+----------------------------------------------------+-----------------------+
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)