sam-wmt opened a new issue #2423:
URL: https://github.com/apache/hudi/issues/2423
Job performance degraded over the course of 2-3 weeks and eventually started
to suffer from significant timeout exceptions in dealing with the ADLS Object
Storage. When working with the Azure storage team they noted excessive
sequential Create Dir operation from the workload and asked if we could
investigate what might be causing this within the Hudi libraries and what could
be done with it. Main note is we're only running two workloads against this
container and as such our IO and operations/sec are well within the norm, where
we're seeing issues is specifically with Delete, Create files.
For a single batch of data we say 65k (30k timed out) create directory
operations which are called in a very small window of time which we believe
caused the job/ storage account to be put into a bad state.
Below are some operation types being issued via our hudi workload across the
day:
![Uploading image.png…]()
**Runtime details:**
Hudi Release: 0.6.0
Spark: Azure Databricks runtime (lite) 2.4 Workers: Standard_D16s_v3
(16-cores each 64GB-Ram, 20 workers)
Streaming Duration: We tried both 10-minutes and 30-minutes on the table
Source: Kafka cluster 105 partitions, average ingestion rate of ~500/sec
spikes of up to 4000/sec (~3KB records)
Storage: Azure ADLSV2 / StorageV2 (general purpose v2, Standard/Hot Storage,
Read-access geo-redundant storage (RA-GRS)
**Table details:**
Table Info: Merge On Read, Inline Compaction every 18 commits, 1 retained
commit per key
Table Seeded via livestream no Insert/Bulk Insert leveraged
As Reported from CLI / Last ### compaction
Row Count: 1,393,797,816 (slowly growing)
Data Size: 542.9 GB
File Count: 15,255
Partitions: Randomly (evenly) distributed into 1024 partitions
**Hudi Configuration:**
Primary Options:
.option(HoodieWriteConfig.UPSERT_PARALLELISM, String.valueOf(320))
.option(HoodieWriteConfig.INSERT_PARALLELISM, String.valueOf(320))
.option(HoodieCompactionConfig.CLEANER_COMMITS_RETAINED_PROP,
String.valueOf(1))
.option(HoodieCompactionConfig.INLINE_COMPACT_NUM_DELTA_COMMITS_PROP,
String.valueOf(18))
.option(HoodieCompactionConfig.INLINE_COMPACT_PROP,
String.valueOf(true))
.option(HoodieStorageConfig.PARQUET_FILE_MAX_BYTES, String.valueOf(256
* 1024 * 1024))
.option(HoodieStorageConfig.PARQUET_BLOCK_SIZE_BYTES,
String.valueOf(256 * 1024 * 1024))
.option(HoodieStorageConfig.PARQUET_COMPRESSION_CODEC, "snappy")
Additional Options:
"hoodie.compaction.strategy" ->
"org.apache.hudi.table.action.compact.strategy.UnBoundedCompactionStrategy",
"hoodie.bloom.index.prune.by.ranges" -> "false"
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]