[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-15396:
--------------------------------
    Description: 
{{numRows}} is not collected when running {{INSERT ... INTO ...}} commands 
against tables backed by S3 (and maybe even other blobstores).

The COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"} entry is missing from the 
{{describe extended}} output.

Repro steps:

{code}
hive> drop table s3_table;
OK
Time taken: 1.87 seconds
hive> create table s3_table (col int) location 
's3a://[bucket-name]/stats-test/';
OK
Time taken: 3.069 seconds
hive> insert into s3_table values (1), (2), (3);
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. spark, tez) 
or using Hive 1.X releases.
Query ID = stakiar_20161208160105_fb3df340-d5fb-4ad6-8776-4f3cae02216d
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2016-12-08 16:01:12,741 Stage-1 map = 0%,  reduce = 0%
2016-12-08 16:01:16,759 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local688636529_0004
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Loading data to table default.s3_table
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 23.0 seconds
hive> select * from s3_table;
OK
1
2
3
Time taken: 0.096 seconds, Fetched: 3 row(s)
hive> describe extended s3_table;
OK
col                     int

Detailed Table Information      Table(tableName:s3_table, dbName:default, 
owner:stakiar, createTime:1481241657, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:col, type:int, comment:null)], 
location:s3a://[bucket-name]/stats-test, 
inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
parameters:{serialization.format=1}), bucketCols:[], sortCols:[], 
parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], 
skewedColValueLocationMaps:{}), storedAsSubDirectories:false), 
partitionKeys:[], parameters:{transient_lastDdlTime=1481241687, totalSize=6, 
numFiles=1}, viewOriginalText:null, viewExpandedText:null, 
tableType:MANAGED_TABLE)
Time taken: 0.037 seconds, Fetched: 3 row(s)
{code}

  was:
{{numRows}} is not collected when running {{INSERT ... INTO ...}} commands 
against tables backed by S3 (and maybe even other blobstores).

The COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"} entry is missing from the 
{{describe extended}} output.

Repro steps:

{code}
hive> drop table s3_table;
OK
Time taken: 1.87 seconds
hive> create table s3_table (col int) location 
's3a://[bucket-name]/stats-test/';
OK
Time taken: 3.069 seconds
hive> insert into s3_table values (1), (2), (3);
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. spark, tez) 
or using Hive 1.X releases.
Query ID = stakiar_20161208160105_fb3df340-d5fb-4ad6-8776-4f3cae02216d
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2016-12-08 16:01:12,741 Stage-1 map = 0%,  reduce = 0%
2016-12-08 16:01:16,759 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local688636529_0004
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Loading data to table default.s3_table
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 23.0 seconds
hive> select * from s3_table;
OK
1
2
3
Time taken: 0.096 seconds, Fetched: 3 row(s)
hive> describe extended s3_table;
OK
col                     int

Detailed Table Information      Table(tableName:s3_table, dbName:default, 
owner:stakiar, createTime:1481241657, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:col, type:int, comment:null)], 
location:s3a://cloudera-dev-hive-on-s3/stats-test, 
inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
parameters:{serialization.format=1}), bucketCols:[], sortCols:[], 
parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], 
skewedColValueLocationMaps:{}), storedAsSubDirectories:false), 
partitionKeys:[], parameters:{transient_lastDdlTime=1481241687, totalSize=6, 
numFiles=1}, viewOriginalText:null, viewExpandedText:null, 
tableType:MANAGED_TABLE)
Time taken: 0.037 seconds, Fetched: 3 row(s)
{code}


> Basic Stats are not collected when running INSERT INTO commands on s3a
> ----------------------------------------------------------------------
>
>                 Key: HIVE-15396
>                 URL: https://issues.apache.org/jira/browse/HIVE-15396
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>
> {{numRows}} is not collected when running {{INSERT ... INTO ...}} commands 
> against tables backed by S3 (and maybe even other blobstores).
> The COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"} entry is missing from the 
> {{describe extended}} output.
> Repro steps:
> {code}
> hive> drop table s3_table;
> OK
> Time taken: 1.87 seconds
> hive> create table s3_table (col int) location 
> 's3a://[bucket-name]/stats-test/';
> OK
> Time taken: 3.069 seconds
> hive> insert into s3_table values (1), (2), (3);
> WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
> future versions. Consider using a different execution engine (i.e. spark, 
> tez) or using Hive 1.X releases.
> Query ID = stakiar_20161208160105_fb3df340-d5fb-4ad6-8776-4f3cae02216d
> Total jobs = 3
> Launching Job 1 out of 3
> Number of reduce tasks is set to 0 since there's no reduce operator
> Job running in-process (local Hadoop)
> 2016-12-08 16:01:12,741 Stage-1 map = 0%,  reduce = 0%
> 2016-12-08 16:01:16,759 Stage-1 map = 100%,  reduce = 0%
> Ended Job = job_local688636529_0004
> Stage-4 is selected by condition resolver.
> Stage-3 is filtered out by condition resolver.
> Stage-5 is filtered out by condition resolver.
> Loading data to table default.s3_table
> MapReduce Jobs Launched:
> Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
> Total MapReduce CPU Time Spent: 0 msec
> OK
> Time taken: 23.0 seconds
> hive> select * from s3_table;
> OK
> 1
> 2
> 3
> Time taken: 0.096 seconds, Fetched: 3 row(s)
> hive> describe extended s3_table;
> OK
> col                   int
> Detailed Table Information    Table(tableName:s3_table, dbName:default, 
> owner:stakiar, createTime:1481241657, lastAccessTime:0, retention:0, 
> sd:StorageDescriptor(cols:[FieldSchema(name:col, type:int, comment:null)], 
> location:s3a://[bucket-name]/stats-test, 
> inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:{serialization.format=1}), bucketCols:[], sortCols:[], 
> parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], 
> skewedColValueLocationMaps:{}), storedAsSubDirectories:false), 
> partitionKeys:[], parameters:{transient_lastDdlTime=1481241687, totalSize=6, 
> numFiles=1}, viewOriginalText:null, viewExpandedText:null, 
> tableType:MANAGED_TABLE)
> Time taken: 0.037 seconds, Fetched: 3 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to