noahtaite commented on issue #10183:
URL: https://github.com/apache/hudi/issues/10183#issuecomment-1828629712

   @Cpandey43
   
   Hey! I'm another Hudi 0.13.1 MOR user, so just thought I'd come by to help 
lend a hand and dig a bit deeper into the problem you're reporting. **To be 
clear - I'm not a Hudi developer.**
   
   **First question** - are you ingesting any updated records or only new 
records? From what I understand, the expected behaviour for MoR table (which I 
can confirm you are using based on your configurations + properties) is:
   - new records are written to base .parquet files
   - updates to records are written to avro .log files
   
   New records need to get associated with a parquet file before updates get 
logged to avro files. I'm not a Hudi developer but to me it seems like this is 
the case partially so that a read optimized query will correctly show new 
records in the data set. However updates will not show until compaction is ran 
against those log files. It is how I've always understood the tradeoff here.
   
   **Second question** - what is the behaviour you expect to see with async 
clustering? It should indeed be a "no-operation" until you asynchronously 
schedule + execute clustering to stitch together small files / improve query 
performance by having control over data locality. I suggest taking a look at 
the [following guide here](https://hudi.apache.org/docs/clustering/) and 
[appropriate 0.13.1 
configurations](https://hudi.apache.org/docs/0.13.1/configurations) to see what 
suits your needs:
   ```
   hoodie.clustering.async.enabled
   hoodie.clustering.inline
   hoodie.clustering.schedule.inline
   ```
   
   Out of the box these are all disabled, so we wouldn't see any clustering 
actions in your timeline. Also, be aware that when you do use clustering, you 
would see a ".replacecommit" in your Hudi timeline.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to