Re: [PR] [CELEBORN-2143] In StoragePolicy, StorageManager create disk file through storage type [celeborn]

via GitHub Tue, 21 Oct 2025 04:33:42 -0700


jakubfijolek commented on PR #3469:
URL: https://github.com/apache/celeborn/pull/3469#issuecomment-3426135058


   Hi, I would add to the topic as I think this change addresses one of the 
bugs I'm experiencing 
   
   **TLDR: If I mix SSD and S3 tier on single worker instance LocalTierWriter 
is not able to evict or create partitions on S3.
   Using mixed tiers works if i use separate workers for S3 and for SSD tiers 
however in such case eviction from worker that is full cannot happen.** 
   
   Observed issue: S3 eviction path uses LocalTierWriter/NIO → 
NoSuchFileException, so SSD offload never happens and large shuffles fail
   
   Build & env
   Celeborn 0.6.0-SNAPSHOT (git b537798e3), Scala 2.13, Hadoop 3.3.6 s3a, AWS 
SDK 1.12.x.
   S3 tier enabled and initialized:
   
   > StorageManager: Initialize S3 support with path s3a://<bucket>/celeborn/
   
   
   Key config:
   
   > celeborn.storage.availableTypes=MEMORY,SSD,S3
   > celeborn.worker.storage.storagePolicy.createFilePolicy=MEMORY,SSD,S3
   > celeborn.worker.storage.storagePolicy.evictPolicy=MEMORY,SSD,S3
   > celeborn.storage.s3.dir=s3a://<bucket>/celeborn/
   > celeborn.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
   > celeborn.worker.storage.disk.reserve.size=5G  
   > # capacity set to ~3.0 TiB; SSD fills during big shuffle
   
   
   Symptom
   When SSD reaches the high-usage threshold, eviction to S3 does not start. 
The worker flips to HIGH_DISK_USAGE, and subsequent creates/writes go through 
LocalTierWriter using NIO against an s3a: URI, leading to NoSuchFileException.
   Example stack (many similar lines):
   
   > ERROR PushDataHandler: Exception encountered when write.
   > java.nio.file.NoSuchFileException: 
s3a:/<bucket>/celeborn/.../application_.../0/173-58-1
   >   at java.nio.channels.FileChannel.open(FileChannel.java:298)
   >   at 
org.apache.celeborn.common.util.FileChannelUtils.createWritableFileChannel(FileChannelUtils.java:28)
   >   at 
org.apache.celeborn.service.deploy.worker.storage.LocalTierWriter.channel(TierWriter.scala:399)
   >   at 
org.apache.celeborn.service.deploy.worker.storage.LocalTierWriter.genFlushTask(TierWriter.scala:410)
   >   at 
org.apache.celeborn.service.deploy.worker.storage.TierWriterBase.flush(TierWriter.scala:195)
   >   at 
org.apache.celeborn.service.deploy.worker.storage.LocalTierWriter.writeInternal(TierWriter.scala:419)
   >   ...
   
   
   Disk monitor shows the SSD is full from Celeborn’s perspective:
   
   > WARN DeviceMonitor: /mnt/celeborn usage is above threshold...
   >   ... usage(Report by Celeborn): { total:2.9 TiB, free:0.0 B }
   > DEBUG StorageManager: ... usableSpace:0 ... status: HIGH_DISK_USAGE
   > 
   
   Commit phase fails with many partitions not committed due to the same NIO/S3 
path mismatch:
   
   > ERROR Controller: Commit file ... failed.
   > java.nio.file.NoSuchFileException: s3a:/<bucket>/celeborn/.../165-20-1
   > ...
   > WARN Controller: CommitFiles ... 291 committed primary, 47 failed primary, 
563 committed replica, 118 failed replica.
   > 
   
   Cleaner thread also hits a DFS handle issue (likely a side-effect of the 
wrong writer path):
   
   > ERROR worker-expired-shuffle-cleaner ... 
   > java.lang.NullPointerException: ... FileSystem.delete(...) because "dfsFs" 
is null
   > 
   
   Inconsistent S3 exposure in heartbeats
   Before a restart one worker sometimes advertises huge S3 available slots 
while SSD is HIGH_DISK_USAGE:
   
   > "S3": "DiskInfo(maxSlots: 0, availableSlots: 137438953471, ... 
storageType: S3) status: HEALTHY"
   > "/mnt/celeborn": "... usableSpace: 0.0 B ... status: HIGH_DISK_USAGE"
   > 
   
   After restart, S3 availableSlots drops to 0 on that same host, and SSD shows 
space again. Behavior suggests the tier choice is not consistently honored at 
file creation time.
   
   What I expected
   With createFilePolicy/evictPolicy = MEMORY,SSD,S3, once SSD approaches the 
limit, new/evicted partitions should be written via the S3/DFS writer, not 
through LocalTierWriter/NIO.
   
   
   Why this PR looks relevant
   My stack shows S3 is selected by policy but the worker still constructs a 
local “disk file” writer and then tries to open an s3a: path with NIO 
(FileChannel.open), which cannot work.
   
   In the the commit  
https://github.com/apache/celeborn/pull/3469/files#diff-332230b33db740720657fd9c90e4f4eb0bce18b43f4749fddfe37463cc11a9b1
 change seems to make StorageManager.createPartition(...) receive and honor the 
intended storageType, routing S3 creates through the DFS/S3 writer path instead 
of LocalTierWriter. That seems to directly address this mismatch.
   
   Could you confirm this could help with the issue I'm experiencing? 
   If this have chance to solve the issue I would rebuild Celeborn with your 
patch and run some saturation tests. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [CELEBORN-2143] In StoragePolicy, StorageManager create disk file through storage type [celeborn]

Reply via email to