geonyeongkim commented on issue #8164:
URL: https://github.com/apache/hudi/issues/8164#issuecomment-1476002948

   ### 1. Did you enable the ckp yet? Flink sink relies the ckp success event 
for Hudi trasanction commiting;
   
   Ckp means checkpoint, right?
   
   As shown in the attached picture, checkpoint is performed normally.
   
   But still no file in hdfs while consuming kafka message.
   
   Moreover, the problem is that we are committing to the kafka broker.
   
   **checkpoint**
   
![image](https://user-images.githubusercontent.com/31622350/226316175-b20f6f51-bc5f-490f-bc1b-58b03df20ec8.png)
   
   **hdfs directory**
   
![image](https://user-images.githubusercontent.com/31622350/226316796-b89fb2ec-a754-47ed-8cee-0558c809ef45.png)
   
   ### 2. Both bulk_insert and append_write use the BulkInsertWriterHelper to 
write the parquet files direcly, there is no UPSERTs, if FLINK_STATE is used, 
things are very diffrent, the StreamWriteFunction would kick in;
   
   Then, in case of FLINK_STATE, can you tell me the difference between 
bulk_insert and append in detail?
   
   ### 3. You can just set up the compress options within the Flink SQL 
options, or the HoodiePipeline
   
   I tried to restart by adding the settings below as a guide.
   
   ```java
   HoodiePipeline.builder("xxx")
       .option("hoodie.parquet.compression.codec", "gzip")
   ```
   
   However, gzip compression still does not apply.
   
   ---
   
   I know that compression is difficult to apply in the stream associated with 
Hadoop.
   
   **But it's very strange that bulk_insert doesn't work.**


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to