Re: [DISCUSS] At-rest encryption for shuffle data on workers and tiered storage

Aravind Patnam Sun, 19 Apr 2026 14:08:43 -0700

Hi,

We already have EAR for celeborn shuffle data internally at LinkedIn, where
we have added this support to respect the existing
spark.io.encryption.enabled config in Spark on the client side.


I am happy to contribute this back and start a CIP for this next week.

Thanks,
Aravind



On Sat, Apr 18, 2026 at 10:24 PM Karthik Prabhakar <[email protected]>
wrote:

> Hi dev@,
>
> I’d like to propose adding at-rest encryption for shuffle data in Celeborn
> and would appreciate the community’s input before writing a full
> implementation.
> cURRENT gap
>
> Celeborn encrypts data in transit (TLS, SASL) but not at rest. When a
> worker flushes shuffle data to local disk, HDFS, S3, or OSS, the bytes land
> as plaintext.
>
> The only write site for local disk is LocalFlushTask.flush() in
> FlushTask.scala (L66, L71 at commit a56f69a), which calls
> fileChannel.write(buffer) with no cipher transform. The tiered-storage
> paths (HdfsFlushTask, S3FlushTask, OssFlushTask) are the same — raw bytes
> to the underlying store.
>
> Verified with:
>
> grep -rnE 'cipher|\.encrypt|aes|envelope' worker/src/main/
> grep -rn  'javax\.crypto'                 worker/src/main/
> (both zero matches)
>
> This matters because spark.io.encryption.enabled does *not* cover the
> Celeborn path. When Celeborn’s ShuffleManager replaces Spark’s shuffle
> writer, Spark’s encryption key is never consulted — confirmed by grepping
> client-spark/ for IOEncryptionKey (zero matches).
>
> Teams adopting Celeborn for performance silently lose shuffle-encryption
> guarantees their compliance posture may assume.
> Who Needs This
>
>    - Regulated industries (healthcare, finance, public sector) whose
>    auditors require application-layer encryption independent of disk/volume
>    encryption.
>    - Multi-tenant platforms needing cryptographic isolation between tenants
>    on shared workers.
>    - Teams using object-store tiering who want encryption before offload.
>
> Proposed Approach (High Level)
>
>    1. A *StreamCipher SPI* in common/ for wrapping WritableByteChannel /
>    ReadableByteChannel with encrypt/decrypt. No KMS SDK in core.
>    2. A *KeyService SPI* for envelope encryption — generate/unwrap DEKs
>    using a KMS-held KEK. Implementations live in separate optional modules
> (
>    aws-kms, gcp-kms, azure-kv, vault, static for dev/PoC).
>    3. Wire into the worker write path: LocalFlushTask wraps fileChannel
>     with StreamCipher.wrapForWrite(). Same for HDFS/S3/OSS flush tasks.
>    4. Wire into the reader path: LocalPartitionDataReader detects a 16-byte
>    encrypted-file header, unwraps the DEK (cached per worker+shuffle),
> wraps
>    the channel with StreamCipher.wrapForRead().
>    5. Opt-in via celeborn.shuffle.io.encryption.enabled=true. Default off.
>    Unencrypted deployments are byte-identical to today, zero overhead.
>    6. Per-shuffle DEKs by default (one KMS call per shuffle reservation,
>    amortized). Per-application DEK scope as an option.
>
> Interaction with Recent Work
>
> CELEBORN-2301 (commit 95419e1) recently landed enhanced zero-copy sendfile
> for FileRegion on native transports — a nice throughput win for the fetch
> path.
>
> Encryption and sendfile are fundamentally incompatible: sendfile(2) cannot
> transform bytes, so encrypted partitions must use a buffered read path.
> This is only relevant for encrypted workloads; unencrypted workloads on the
> same cluster keep the full CELEBORN-2301 benefit. Per-application
> encryption flags (not per-cluster) would let encrypted and unencrypted apps
> coexist without regressing the latter.
> Questions for the Community
>
> Trimming to three since these are the ones I’d need opinions on before
> writing code. Happy to take the rest up in follow-ups.
>
>    - Any prior design work or internal discussion on this topic I should
>    know about before proceeding?
>    - *Per-shuffle vs. per-application DEK scope* as the default?
>    Per-shuffle gives smaller blast radius and simpler lifecycle;
>    per-application amortizes KMS round-trips and is friendlier for
>    long-running jobs.
>    - *Key distribution path:* wrapped DEKs flow through Master metadata
>    (simpler, one KMS-aware role) vs. workers unwrap directly from KMS
> (removes
>    Master from the key path, but every worker needs KMS credentials).
>    Preference?
>
> Tracking
>
> JIRA: CELEBORN-2311 <https://issues.apache.org/jira/browse/CELEBORN-2311>
>
> I have a detailed design document with source citations, threat model,
> performance analysis, and phased implementation plan. Happy to share
> on-list or off-list if there’s interest.
>
> - Karthik
>


-- 
Aravind K. Patnam

Re: [DISCUSS] At-rest encryption for shuffle data on workers and tiered storage

Reply via email to