sam-wmt opened a new issue #2423:
URL: https://github.com/apache/hudi/issues/2423


   Job performance degraded over the course of 2-3 weeks and eventually started 
to suffer from significant timeout exceptions in dealing with the ADLS Object 
Storage.  When working with the Azure storage team they noted excessive 
sequential Create Dir operation from the workload and asked if we could 
investigate what might be causing this within the Hudi libraries and what could 
be done with it.  Main note is we're only running two workloads against this 
container and as such our IO and operations/sec are well within the norm, where 
we're seeing issues is specifically with Delete, Create files.
   
   For a single batch of data we say 65k (30k timed out) create directory 
operations which are called in a very small window of time which we believe 
caused the job/ storage account to be put into a bad state.
   
   Below are some operation types being issued via our hudi workload across the 
day:
   ![Uploading image.png…]()
   
   
   **Runtime details:**
   Hudi Release: 0.6.0
   Spark: Azure Databricks runtime (lite) 2.4 Workers: Standard_D16s_v3 
(16-cores each 64GB-Ram, 20 workers)
   Streaming Duration: We tried both 10-minutes and 30-minutes on the table
   Source: Kafka cluster 105 partitions, average ingestion rate of ~500/sec 
spikes of up to 4000/sec (~3KB records)
   Storage: Azure ADLSV2 / StorageV2 (general purpose v2, Standard/Hot Storage, 
Read-access geo-redundant storage (RA-GRS)
   
   **Table details:**
   Table Info: Merge On Read, Inline Compaction every 18 commits, 1 retained 
commit per key
   Table Seeded via livestream no Insert/Bulk Insert leveraged
   As Reported from CLI / Last ### compaction
   Row Count: 1,393,797,816 (slowly growing)
   Data Size:  542.9 GB
   File Count: 15,255
   Partitions: Randomly (evenly) distributed into 1024 partitions
   
   **Hudi Configuration:**
   Primary Options:
         .option(HoodieWriteConfig.UPSERT_PARALLELISM, String.valueOf(320))
         .option(HoodieWriteConfig.INSERT_PARALLELISM, String.valueOf(320))
         .option(HoodieCompactionConfig.CLEANER_COMMITS_RETAINED_PROP, 
String.valueOf(1))
         .option(HoodieCompactionConfig.INLINE_COMPACT_NUM_DELTA_COMMITS_PROP, 
String.valueOf(18))
         .option(HoodieCompactionConfig.INLINE_COMPACT_PROP, 
String.valueOf(true))
         .option(HoodieStorageConfig.PARQUET_FILE_MAX_BYTES, String.valueOf(256 
* 1024 * 1024))
         .option(HoodieStorageConfig.PARQUET_BLOCK_SIZE_BYTES, 
String.valueOf(256 * 1024 * 1024))
         .option(HoodieStorageConfig.PARQUET_COMPRESSION_CODEC, "snappy")
   Additional Options:
   "hoodie.compaction.strategy" -> 
"org.apache.hudi.table.action.compact.strategy.UnBoundedCompactionStrategy",
   "hoodie.bloom.index.prune.by.ranges" -> "false"
      


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to