RajasekarSribalan opened a new issue #1794:
URL: https://github.com/apache/hudi/issues/1794
**Describe the problem you faced**
Hi, We are doing upserts and deletes in Hudi COW tables. It is Spark
streaming app which reads data from Kafka and upsert it in Hudi. Below is the
psuedocode
1. var df= read kafka
2. df.persist() // we persist the dataframe because we can have both upsert
and delete records in single dataframe. SO filter them based or U or D
3. Filter only upsert records and insert it in hudi
4. Filter only Hudi records and insert it in Hudi
5. df.unpersist()
While doing delete, it is throwing below error. My Question, should we need
to sync with Hive even for delete operation?Pls confirm.
20/07/05 10:19:20 ERROR hive.HiveSyncTool: Got runtime exception when hive
syncing
18039 java.lang.IllegalArgumentException: Could not find any data file
written for commit [20200705101913__commit__COMPLETED], could not get schema
for table /user/admin/hudi/users, Metadata
:HoodieCommitMetadata{partitionToWriteStats={}, compacted=false,
extraMetadata={ROLLING_STAT={
18040 "partitionToRollingStats" : {
18041 "" : {
18042 "d398058e-f8f4-4772-9fcb-012318ac8f47-0" : {
18043 "fileId" : "d398058e-f8f4-4772-9fcb-012318ac8f47-0",
18044 "inserts" : 989333,
18045 "upserts" : 11,
18046 "deletes" : 0,
18047 "totalInputWriteBytesToDisk" : 0,
18048 "totalInputWriteBytesOnDisk" : 49443028
18049 },
18050 "eed1f67c-8c46-425f-b740-2e21b84c6f13-0" : {
18051 "fileId" : "eed1f67c-8c46-425f-b740-2e21b84c6f13-0",
18052 "inserts" : 1263360,
18053 "upserts" : 16,
18054 "deletes" : 0,
18055 "totalInputWriteBytesToDisk" : 0,
18056 "totalInputWriteBytesOnDisk" : 49672386
18057 },
18058 "e9f38e55-acf2-4bd2-b568-def7361f2f29-0" : {
18059 "fileId" : "e9f38e55-acf2-4bd2-b568-def7361f2f29-0",
18060 "inserts" : 946616,
18061 "upserts" : 6,
18062 "deletes" : 0,
18063 "totalInputWriteBytesToDisk" : 0,
18064 "totalInputWriteBytesOnDisk" : 45686395
18065 },
18066 "8a93afac-d60e-41bb-a3e1-edd793e2a932-0" : {
18067 "fileId" : "8a93afac-d60e-41bb-a3e1-edd793e2a932-0",
18068 "inserts" : 482202,
18069 "upserts" : 0,
18070 "deletes" : 0,
18071 "totalInputWriteBytesToDisk" : 0,
18072 "totalInputWriteBytesOnDisk" : 49744729
**Expected behavior**
A clear and concise description of what you expected to happen.
**Environment Description**
* Hudi version : 0.5.2
* Spark version : CLoudera spark 2.2.0
* Hive version : Cloudera Hive 1.1
* Hadoop version :2.6
* Storage (HDFS/S3/GCS..) :HDFS
* Running on Docker? (yes/no) :No
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]