moranyuwen opened a new issue #3094:
URL: https://github.com/apache/hudi/issues/3094
I created a Hudi table member2 using Spark, select * from member2 was a
success. but select count(*) from member2 is error;
Steps to reproduce the behavior:
1.spark create table
df.write.format("org.apache.hudi")
.option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY,
DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL)
.option("hoodie.insert.shuffle.parallelism", 12)
.option("hoodie.upsert.shuffle.parallelism", 12)
.option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "uid")
.option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "ts")
.option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY,
"hudipartition")
.option("hoodie.table.name", "member2")
.option(DataSourceWriteOptions.HIVE_URL_OPT_KEY,
"jdbc:hive2://node-1:10000")
.option(DataSourceWriteOptions.HIVE_DATABASE_OPT_KEY, "default")
.option(DataSourceWriteOptions.HIVE_TABLE_OPT_KEY, "member2")
.option(DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY, "dt,dn")
.option(DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY,
classOf[MultiPartKeysValueExtractor].getName)
.option(DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY, "true")
.option(HoodieIndexConfig.BLOOM_INDEX_UPDATE_PARTITION_PATH, "true")
.option(HoodieIndexConfig.INDEX_TYPE_PROP,
HoodieIndex.IndexType.GLOBAL_BLOOM.name())
.option("hoodie.insert.shuffle.parallelism", "12")
.option("hoodie.upsert.shuffle.parallelism", "12")
.mode(SaveMode.Append)
.save("/user/hdfs/hudi/hivetest2")
2.query1
0: jdbc:hive2://node-2:10000> select uid,fullname from member2;
INFO : OK
+--------+-----------+
| uid | fullname |
+--------+-----------+
| 10001 | name1 |
| 10002 | name2 |
| 10003 | name3 |
| 10004 | name4 |
| 10005 | name5 |
+--------+-----------+
3. query2
0: jdbc:hive2://node-2:10000> select uid,fullname from member2 where
fullname = "name1";
INFO : Compiling
command(queryId=hive_20210616174328_fbd72ca5-c343-4db8-b352-5a6ec4765271):
select uid,fullname from member2 where fullname = "name1"
INFO : Semantic Analysis Completed
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:uid,
type:int, comment:null), FieldSchema(name:fullname, type:string,
comment:null)], properties:null)
INFO : Completed compiling
command(queryId=hive_20210616174328_fbd72ca5-c343-4db8-b352-5a6ec4765271); Time
taken: 0.575 seconds
INFO : Executing
command(queryId=hive_20210616174328_fbd72ca5-c343-4db8-b352-5a6ec4765271):
select uid,fullname from member2 where fullname = "name1"
WARN :
INFO : Query ID = hive_20210616174328_fbd72ca5-c343-4db8-b352-5a6ec4765271
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-1:MAPRED] in serial mode
INFO : Number of reduce tasks is set to 0 since there's no reduce operator
INFO : number of splits:2
INFO : Submitting tokens for job: job_1612164367687_1569
INFO : Executing with tokens: []
INFO : The url to track the job:
http://node-1:8088/proxy/application_1612164367687_1569/
INFO : Starting Job = job_1612164367687_1569, Tracking URL =
http://node-1:8088/proxy/application_1612164367687_1569/
INFO : Kill Command =
/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job
-kill job_1612164367687_1569
INFO : Hadoop job information for Stage-1: number of mappers: 2; number of
reducers: 0
INFO : 2021-06-16 17:43:52,469 Stage-1 map = 0%, reduce = 0%
INFO : 2021-06-16 17:44:17,925 Stage-1 map = 100%, reduce = 0%
ERROR : Ended Job = job_1612164367687_1569 with errors
ERROR : FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
INFO : MapReduce Jobs Launched:
INFO : Stage-Stage-1: Map: 2 HDFS Read: 0 HDFS Write: 0 HDFS EC Read: 0
FAIL
INFO : Total MapReduce CPU Time Spent: 0 msec
INFO : Completed executing
command(queryId=hive_20210616174328_fbd72ca5-c343-4db8-b352-5a6ec4765271); Time
taken: 52.498 seconds
Error: Error while processing statement: FAILED: Execution Error, return
code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)
Q:
1、Why does a where query return an error?
2、select count(*) from member2, The same error。
**Expected behavior**
A clear and concise description of what you expected to happen.
**Environment Description**
* Hudi version : 0.9.0
* Spark version : 2.4.5
* Hive version : 2.1.1
* Hadoop version : 3.0.0
* Storage (HDFS/S3/GCS..) : HDFS
* Running on Docker? (yes/no) : no
**Additional context**
Add any other context about the problem here.
**Stacktrace**
```Add the stacktrace of the error.```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]