DavidZ1 opened a new issue, #8267:
URL: https://github.com/apache/hudi/issues/8267

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
[email protected].
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   We have a flink task that consumes kafka messages and then writes them into 
the hudi table, using MOR table, index using buecket index, and the write mode 
of the table is upsert. Our MOR table has 2 levels of partitions, day and hour.
   
   After the flink task has been running for a period of time, we found that 
the log files in each hour partition were not converted to parquet files. We 
also checked the compaction request file and found that it did not contain all 
the log files. I don’t know how to solve it? At the same time, I also want to 
know how to judge that the data logs files of a certain partition have been 
compacted?
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.
   2.
   3.
   4.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.13.0
   
   * Spark version : 3.2.1
   
   * Hive version : 3.2.1
   
   * Hadoop version : 3.2.1
   
   * Storage (HDFS/S3/GCS..) : COSN
   
   * Running on Docker? (yes/no) : yes
   
   
   **Additional context**
   
   1.Hudi config
   
   ```java
   checkpoint.interval=300
   checkpoint.timeout=900
   compaction.max_memory=1024
   
payload.class.name=org.apache.hudi.common.model.OverwriteNonDefaultsWithLatestAvroPayload
   compaction.delta_commits=5
   compaction.trigger.strategy=num_or_time
   compaction.delta_seconds=3600
   clean.policy=KEEP_LATEST_COMMITS
   clean.retain_commits=1
   hoodie.bucket.index.num.buckets=50
   archive.max_commits=50
   archive.min_commits=40
   compaction.async.enabled=true
   write.operation=upsert
   table.type=MERGE_ON_READ
   index.type=BUCKET
   checkpoint.incremental.enable=true
   ``` 
   
   
   2.hoodie.properties
   
   ```java
   
   hoodie.table.precombine.field=acquire_timestamp
   hoodie.datasource.write.drop.partition.columns=false
   hoodie.table.partition.fields=pt,ht
   hoodie.table.type=MERGE_ON_READ
   hoodie.archivelog.folder=archived
   hoodie.table.cdc.enabled=false
   
hoodie.compaction.payload.class=org.apache.hudi.common.model.OverwriteNonDefaultsWithLatestAvroPayload
   hoodie.table.version=5
   hoodie.timeline.layout.version=1
   hoodie.table.recordkey.fields=vin,acquire_timestamp
   hoodie.datasource.write.partitionpath.urlencode=false
   hoodie.table.name=ods_icv_can_hudi_temp
   
hoodie.table.keygenerator.class=org.apache.hudi.keygen.ComplexAvroKeyGenerator
   hoodie.compaction.record.merger.strategy=eeb8d96f-b1e4-49fd-bbf8-28ac514178e5
   hoodie.datasource.write.hive_style_partitioning=true
   ``` 
   
   3.DAG
   
![2012fe42112700a0bae99e5b95054eb](https://user-images.githubusercontent.com/30795397/226840153-71dc771b-7322-4605-8f9f-1006c3259205.png)
   
   4.Data file
   
   
![9a882f5a6ea5f1ec3c156e297ea0636](https://user-images.githubusercontent.com/30795397/226839970-694da791-a9aa-4589-838a-561d2de2eaee.png)
   
   00000023-96fb-4ca0-b0ae-0547b5898b3b fileId parquet size is 40MB,but arvo 
logs files size  1500MB+,so some arvo logs not compact to parquet.
   
   
![2434969ad9f5389e2f5b051fac3ccd7](https://user-images.githubusercontent.com/30795397/226840070-bc4be834-8c5e-4bcd-90f7-df8fa7f38938.png)
   
   We found that the compact.request file under the hoodie directory does not 
contain all arvo log files.
   
   **Stacktrace**
   
   1. Clean file exception 
   
   ```
   2023-03-22 14:32:37.627 [pool-18-thread-1] WARN  
org.apache.hudi.table.action.clean.CleanActionExecutor [] - Failed to perform 
previous clean operation, instant: [==>20230322143231759__clean__REQUESTED]
   java.lang.NullPointerException: Expected a non-null value. Got null
        at org.apache.hudi.common.util.Option.<init>(Option.java:65) 
~[blob_p-7584645ba23f46692000bbfac6ef844cbd0e30ce-451b376bd445dd495f01c72e3dff67e5:?]
        at org.apache.hudi.common.util.Option.of(Option.java:76) 
~[blob_p-7584645ba23f46692000bbfac6ef844cbd0e30ce-451b376bd445dd495f01c72e3dff67e5:?]
        at 
org.apache.hudi.table.action.clean.CleanActionExecutor.runClean(CleanActionExecutor.java:230)
 
~[blob_p-7584645ba23f46692000bbfac6ef844cbd0e30ce-451b376bd445dd495f01c72e3dff67e5:?]
        at 
org.apache.hudi.table.action.clean.CleanActionExecutor.runPendingClean(CleanActionExecutor.java:187)
 
~[blob_p-7584645ba23f46692000bbfac6ef844cbd0e30ce-451b376bd445dd495f01c72e3dff67e5:?]
        at 
org.apache.hudi.table.action.clean.CleanActionExecutor.lambda$execute$8(CleanActionExecutor.java:256)
 
~[blob_p-7584645ba23f46692000bbfac6ef844cbd0e30ce-451b376bd445dd495f01c72e3dff67e5:?]
        at java.util.ArrayList.forEach(ArrayList.java:1259) ~[?:1.8.0_332]
        at 
org.apache.hudi.table.action.clean.CleanActionExecutor.execute(CleanActionExecutor.java:250)
 
~[blob_p-7584645ba23f46692000bbfac6ef844cbd0e30ce-451b376bd445dd495f01c72e3dff67e5:?]
        at 
org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.clean(HoodieFlinkCopyOnWriteTable.java:322)
 
~[blob_p-7584645ba23f46692000bbfac6ef844cbd0e30ce-451b376bd445dd495f01c72e3dff67e5:?]
        at 
org.apache.hudi.client.BaseHoodieTableServiceClient.clean(BaseHoodieTableServiceClient.java:554)
 
~[blob_p-7584645ba23f46692000bbfac6ef844cbd0e30ce-451b376bd445dd495f01c72e3dff67e5:?]
        at 
org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:758)
 
~[blob_p-7584645ba23f46692000bbfac6ef844cbd0e30ce-451b376bd445dd495f01c72e3dff67e5:?]
        at 
org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:730)
 
~[blob_p-7584645ba23f46692000bbfac6ef844cbd0e30ce-451b376bd445dd495f01c72e3dff67e5:?]
        at 
org.apache.hudi.async.AsyncCleanerService.lambda$startService$0(AsyncCleanerService.java:55)
 
~[blob_p-7584645ba23f46692000bbfac6ef844cbd0e30ce-451b376bd445dd495f01c72e3dff67e5:?]
        at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
 [?:1.8.0_332]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_332]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_332]
        at java.lang.Thread.run(Thread.java:750) [?:1.8.0_332]
   ```
   
   
   2.When we use MOR table + Insert model,There are warn logs such as 
compact,but the MOR table + Upsert do not has this. 
   The following exception occurs:
   
   `2023-03-21 14:32:36.033 [JettyServerThreadPool-334] WARN  
org.apache.hudi.timeline.service.RequestHandler [] - Bad request response due 
to client view behind server view. Last known instant from client was 
20230321142047972 but server has the following timeline 
[[20230321125010487__deltacommit__COMPLETED], 
[20230321125525961__deltacommit__COMPLETED], 
[20230321130051999__deltacommit__COMPLETED], 
[20230321130617771__deltacommit__COMPLETED], 
[20230321131133084__deltacommit__COMPLETED], 
[20230321131650502__deltacommit__COMPLETED], 
[==>20230321132210140__compaction__INFLIGHT], 
[20230321132212886__deltacommit__COMPLETED], 
[==>20230321132729719__compaction__INFLIGHT], 
[20230321132731672__deltacommit__COMPLETED], 
[==>20230321133253906__compaction__INFLIGHT], 
[20230321133256109__deltacommit__COMPLETED], 
[==>20230321133820416__compaction__INFLIGHT], 
[20230321133822486__deltacommit__COMPLETED], 
[==>20230321134348164__compaction__INFLIGHT], 
[20230321134350553__deltacommit__COMPLETED], [20230
 321134912462__deltacommit__COMPLETED], 
[20230321135434761__deltacommit__COMPLETED], 
[20230321140440297__rollback__COMPLETED], 
[20230321140440947__rollback__COMPLETED], 
[20230321140443670__deltacommit__COMPLETED], 
[20230321140445923__rollback__COMPLETED], 
[20230321140450567__rollback__COMPLETED], 
[20230321140454064__rollback__COMPLETED], 
[20230321140456989__rollback__COMPLETED], 
[==>20230321140910025__compaction__REQUESTED], 
[20230321140913981__deltacommit__COMPLETED], 
[==>20230321141505445__compaction__REQUESTED], 
[20230321141508195__deltacommit__COMPLETED], 
[20230321142047972__deltacommit__COMPLETED], 
[20230321142644822__deltacommit__COMPLETED]]`
   
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to