geonyeongkim commented on issue #8164: URL: https://github.com/apache/hudi/issues/8164#issuecomment-1476002948
### 1. Did you enable the ckp yet? Flink sink relies the ckp success event for Hudi trasanction commiting; Ckp means checkpoint, right? As shown in the attached picture, checkpoint is performed normally. But still no file in hdfs while consuming kafka message. Moreover, the problem is that we are committing to the kafka broker. **checkpoint**  **hdfs directory**  ### 2. Both bulk_insert and append_write use the BulkInsertWriterHelper to write the parquet files direcly, there is no UPSERTs, if FLINK_STATE is used, things are very diffrent, the StreamWriteFunction would kick in; Then, in case of FLINK_STATE, can you tell me the difference between bulk_insert and append in detail? ### 3. You can just set up the compress options within the Flink SQL options, or the HoodiePipeline I tried to restart by adding the settings below as a guide. ```java HoodiePipeline.builder("xxx") .option("hoodie.parquet.compression.codec", "gzip") ``` However, gzip compression still does not apply. --- I know that compression is difficult to apply in the stream associated with Hadoop. **But it's very strange that bulk_insert doesn't work.** -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
