[GitHub] [hudi] RajasekarSribalan opened a new issue #1794: [SUPPORT] Hudi delete operation but HiveSync failed

GitBox Sun, 05 Jul 2020 04:01:26 -0700


RajasekarSribalan opened a new issue #1794:
URL: https://github.com/apache/hudi/issues/1794



   **Describe the problem you faced**
   
   Hi, We are doing upserts and deletes in Hudi COW tables. It is Spark 
streaming app which reads data from Kafka and upsert it in Hudi. Below is the 
psuedocode
   
   1. var df=  read kafka
   2. df.persist() // we persist the dataframe because we can have both upsert 
and delete records in single dataframe. SO filter them based or U or D
   3. Filter only upsert records and insert it in hudi
   4. Filter only Hudi records and insert it in Hudi
   5. df.unpersist()
   
   While doing delete, it is throwing below error. My Question, should we need 
to sync with Hive even for delete operation?Pls confirm.
   
   20/07/05 10:19:20 ERROR hive.HiveSyncTool: Got runtime exception when hive 
syncing
   18039 java.lang.IllegalArgumentException: Could not find any data file 
written for commit [20200705101913__commit__COMPLETED], could not get schema 
for table /user/admin/hudi/users, Metadata       
:HoodieCommitMetadata{partitionToWriteStats={}, compacted=false, 
extraMetadata={ROLLING_STAT={
   18040   "partitionToRollingStats" : {
   18041     "" : {
   18042       "d398058e-f8f4-4772-9fcb-012318ac8f47-0" : {
   18043         "fileId" : "d398058e-f8f4-4772-9fcb-012318ac8f47-0",
   18044         "inserts" : 989333,
   18045         "upserts" : 11,
   18046         "deletes" : 0,
   18047         "totalInputWriteBytesToDisk" : 0,
   18048         "totalInputWriteBytesOnDisk" : 49443028
   18049       },
   18050       "eed1f67c-8c46-425f-b740-2e21b84c6f13-0" : {
   18051         "fileId" : "eed1f67c-8c46-425f-b740-2e21b84c6f13-0",
   18052         "inserts" : 1263360,
   18053         "upserts" : 16,
   18054         "deletes" : 0,
   18055         "totalInputWriteBytesToDisk" : 0,
   18056         "totalInputWriteBytesOnDisk" : 49672386
   18057       },
   18058       "e9f38e55-acf2-4bd2-b568-def7361f2f29-0" : {
   18059         "fileId" : "e9f38e55-acf2-4bd2-b568-def7361f2f29-0",
   18060         "inserts" : 946616,
   18061         "upserts" : 6,
   18062         "deletes" : 0,
   18063         "totalInputWriteBytesToDisk" : 0,
   18064         "totalInputWriteBytesOnDisk" : 45686395
   18065       },
   18066       "8a93afac-d60e-41bb-a3e1-edd793e2a932-0" : {
   18067         "fileId" : "8a93afac-d60e-41bb-a3e1-edd793e2a932-0",
   18068         "inserts" : 482202,
   18069         "upserts" : 0,
   18070         "deletes" : 0,
   18071         "totalInputWriteBytesToDisk" : 0,
   18072         "totalInputWriteBytesOnDisk" : 49744729
    
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.5.2
   
   * Spark version : CLoudera spark 2.2.0
   
   * Hive version : Cloudera Hive 1.1
   
   * Hadoop version :2.6
   
   * Storage (HDFS/S3/GCS..) :HDFS
   
   * Running on Docker? (yes/no) :No
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] RajasekarSribalan opened a new issue #1794: [SUPPORT] Hudi delete operation but HiveSync failed

Reply via email to