[jira] [Commented] (IMPALA-9258) impala and hive query result are different

authur wang (Jira) Tue, 17 Dec 2019 22:52:09 -0800


    [ 
https://issues.apache.org/jira/browse/IMPALA-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16998859#comment-16998859
 ]


authur wang commented on IMPALA-9258:
-------------------------------------

this is our query info：

 
0: jdbc:hive2://wg3232.hadoop.com:10000/defau> select count(*) from 
gl_hisdb.tbl_chhis_accqr_transaction_inf where hp_settle_dt = '20190815';
INFO  : Compiling 
command(queryId=hive_20191218141818_434572dc-0796-4619-b489-525602cd7625): 
select count(*) from gl_hisdb.tbl_chhis_accqr_transaction_inf where 
hp_settle_dt = '20190815'
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, 
type:bigint, comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20191218141818_434572dc-0796-4619-b489-525602cd7625); Time 
taken: 0.256 seconds
INFO  : Executing 
command(queryId=hive_20191218141818_434572dc-0796-4619-b489-525602cd7625): 
select count(*) from gl_hisdb.tbl_chhis_accqr_transaction_inf where 
hp_settle_dt = '20190815'
WARN  : 
INFO  : Query ID = hive_20191218141818_434572dc-0796-4619-b489-525602cd7625
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Number of reduce tasks determined at compile time: 1
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=<number>
INFO  : In order to set a constant number of reducers:
INFO  :   set mapreduce.job.reduces=<number>
INFO  : number of splits:1
INFO  : Submitting tokens for job: job_1576632415577_0011
INFO  : Executing with tokens: [Kind: HDFS_DELEGATION_TOKEN, Service: 
ha-hdfs:ns4, Ident: (token for hive: HDFS_DELEGATION_TOKEN 
owner=hive/[email protected], renewer=yarn, realUser=, 
issueDate=1576649898477, maxDate=1577254698477, sequenceNumber=5304183, 
masterKeyId=49)]
INFO  : The url to track the job: 
http://wg3226.hadoop.com:8088/proxy/application_1576632415577_0011/
INFO  : Starting Job = job_1576632415577_0011, Tracking URL = 
http://wg3226.hadoop.com:8088/proxy/application_1576632415577_0011/
INFO  : Kill Command = 
/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/hadoop/bin/hadoop job 
 -kill job_1576632415577_0011
INFO  : Hadoop job information for Stage-1: number of mappers: 1; number of 
reducers: 1
INFO  : 2019-12-18 14:18:27,282 Stage-1 map = 0%,  reduce = 0%
INFO  : 2019-12-18 14:18:35,449 Stage-1 map = 100%,  reduce = 0%, Cumulative 
CPU 2.19 sec
INFO  : 2019-12-18 14:18:41,573 Stage-1 map = 100%,  reduce = 100%, Cumulative 
CPU 4.65 sec
INFO  : MapReduce Total cumulative CPU time: 4 seconds 650 msec
INFO  : Ended Job = job_1576632415577_0011
INFO  : MapReduce Jobs Launched: 
INFO  : Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 4.65 sec   HDFS 
Read: 14570 HDFS Write: 104 HDFS EC Read: 0 SUCCESS
INFO  : Total MapReduce CPU Time Spent: 4 seconds 650 msec
INFO  : Completed executing 
command(queryId=hive_20191218141818_434572dc-0796-4619-b489-525602cd7625); Time 
taken: 24.31 seconds
INFO  : OK
+-------+
|  _c0  |
+-------+
| 1309  |
+-------+
1 row selected (24.616 seconds)
0: jdbc:hive2://wg3232.hadoop.com:10000/defau> 
 
0: jdbc:hive2://wg3232.hadoop.com:21050/defau> select count(*) from 
gl_hisdb.tbl_chhis_accqr_transaction_inf where hp_settle_dt = '20190815';
0
+-----------+
| count(*)  |
+-----------+
+-----------+
No rows selected (0.046 seconds)
0: jdbc:hive2://wg3232.hadoop.com:21050/defau> 

> impala and hive query result are different 
> -------------------------------------------
>
>                 Key: IMPALA-9258
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9258
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Clients
>    Affects Versions: Impala 3.2.0
>         Environment: CDH6.2.1
>            Reporter: authur wang
>            Priority: Major
>              Labels: newbie
>         Attachments: user_inf.zip
>
>
> After we use mapreduce to generate rcfiles, we find that the results between 
> hive and impala are different. The hive query will generate the right result 
> while impala will get wrong result.
>  
> the attachment is the data files.
>  
> the ddl of the table is :
> CREATE EXTERNAL TABLE user_inf (
>  id BIGINT,
>  user_id STRING,
>  cert_id STRING,
>  name STRING,
>  mobile STRING,
>  access_id STRING,
>  status STRING,
>  channel STRING,
>  rec_crt_ts STRING,
>  rec_upd_ts STRING,
>  ver INT
>  )
>  STORED AS RCFILE
>  LOCATION '/user_inf'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-9258) impala and hive query result are different

Reply via email to