[GitHub] [hudi] moranyuwen opened a new issue #3094: Hive query hudi FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

GitBox Wed, 16 Jun 2021 02:49:49 -0700


moranyuwen opened a new issue #3094:
URL: https://github.com/apache/hudi/issues/3094



   I created a Hudi table member2 using Spark, select * from member2 was a 
success. but select count(*) from member2 is error;
   
   Steps to reproduce the behavior:
   1.spark create table
   df.write.format("org.apache.hudi")
     .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, 
DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL)
     .option("hoodie.insert.shuffle.parallelism", 12)
     .option("hoodie.upsert.shuffle.parallelism", 12)
     .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "uid")
     .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "ts")
     .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, 
"hudipartition")
     .option("hoodie.table.name", "member2")
     .option(DataSourceWriteOptions.HIVE_URL_OPT_KEY, 
"jdbc:hive2://node-1:10000") 
     .option(DataSourceWriteOptions.HIVE_DATABASE_OPT_KEY, "default")
     .option(DataSourceWriteOptions.HIVE_TABLE_OPT_KEY, "member2")
     .option(DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY, "dt,dn")
     .option(DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY, 
classOf[MultiPartKeysValueExtractor].getName) 
     .option(DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY, "true") 
     .option(HoodieIndexConfig.BLOOM_INDEX_UPDATE_PARTITION_PATH, "true") 
     .option(HoodieIndexConfig.INDEX_TYPE_PROP, 
HoodieIndex.IndexType.GLOBAL_BLOOM.name())
     .option("hoodie.insert.shuffle.parallelism", "12")
     .option("hoodie.upsert.shuffle.parallelism", "12")
     .mode(SaveMode.Append)
     .save("/user/hdfs/hudi/hivetest2")
   2.query1
   0: jdbc:hive2://node-2:10000> select uid,fullname from member2;
   INFO  : OK
   +--------+-----------+
   |  uid   | fullname  |
   +--------+-----------+
   | 10001  | name1     |
   | 10002  | name2     |
   | 10003  | name3     |
   | 10004  | name4     |
   | 10005  | name5     |
   +--------+-----------+
   3. query2
   0: jdbc:hive2://node-2:10000> select uid,fullname from member2 where 
fullname = "name1";
   INFO  : Compiling 
command(queryId=hive_20210616174328_fbd72ca5-c343-4db8-b352-5a6ec4765271): 
select uid,fullname from member2 where fullname = "name1"
   INFO  : Semantic Analysis Completed
   INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:uid, 
type:int, comment:null), FieldSchema(name:fullname, type:string, 
comment:null)], properties:null)
   INFO  : Completed compiling 
command(queryId=hive_20210616174328_fbd72ca5-c343-4db8-b352-5a6ec4765271); Time 
taken: 0.575 seconds
   INFO  : Executing 
command(queryId=hive_20210616174328_fbd72ca5-c343-4db8-b352-5a6ec4765271): 
select uid,fullname from member2 where fullname = "name1"
   WARN  :
   INFO  : Query ID = hive_20210616174328_fbd72ca5-c343-4db8-b352-5a6ec4765271
   INFO  : Total jobs = 1
   INFO  : Launching Job 1 out of 1
   INFO  : Starting task [Stage-1:MAPRED] in serial mode
   INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
   INFO  : number of splits:2
   INFO  : Submitting tokens for job: job_1612164367687_1569
   INFO  : Executing with tokens: []
   INFO  : The url to track the job: 
http://node-1:8088/proxy/application_1612164367687_1569/
   INFO  : Starting Job = job_1612164367687_1569, Tracking URL = 
http://node-1:8088/proxy/application_1612164367687_1569/
   INFO  : Kill Command = 
/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job 
 -kill job_1612164367687_1569
   INFO  : Hadoop job information for Stage-1: number of mappers: 2; number of 
reducers: 0
   INFO  : 2021-06-16 17:43:52,469 Stage-1 map = 0%,  reduce = 0%
   INFO  : 2021-06-16 17:44:17,925 Stage-1 map = 100%,  reduce = 0%
   ERROR : Ended Job = job_1612164367687_1569 with errors
   ERROR : FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
   INFO  : MapReduce Jobs Launched:
   INFO  : Stage-Stage-1: Map: 2   HDFS Read: 0 HDFS Write: 0 HDFS EC Read: 0 
FAIL
   INFO  : Total MapReduce CPU Time Spent: 0 msec
   INFO  : Completed executing 
command(queryId=hive_20210616174328_fbd72ca5-c343-4db8-b352-5a6ec4765271); Time 
taken: 52.498 seconds
   Error: Error while processing statement: FAILED: Execution Error, return 
code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)
   
   
   Q: 
   1、Why does a where query return an error?
   2、select count(*) from member2, The same error。
   
   
   
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.9.0
   * Spark version : 2.4.5
   * Hive version : 2.1.1
   * Hadoop version :  3.0.0
   * Storage (HDFS/S3/GCS..) :  HDFS
   * Running on Docker? (yes/no) :  no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] moranyuwen opened a new issue #3094: Hive query hudi FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

Reply via email to