ever4Kenny opened a new pull request, #3581:
URL: https://github.com/apache/celeborn/pull/3581
### What changes were proposed in this pull request?
This PR converts the static initialization of columnar shuffle class
constructors
and skew shuffle method to lazy initialization using the
initialization-on-demand
holder idiom (static inner class pattern) in SparkUtils.java.
Specifically, the following changes were made:
1. Introduced `ColumnarHashBasedShuffleWriterConstructorHolder` static inner
class
to lazily initialize the constructor for ColumnarHashBasedShuffleWriter
2. Introduced `ColumnarShuffleReaderConstructorHolder` static inner class to
lazily
initialize the constructor for CelebornColumnarShuffleReader
3. Introduced `CelebornSkewShuffleMethodHolder` static inner class to lazily
initialize the `isCelebornSkewedShuffle` method reference
4. Modified `createColumnarHashBasedShuffleWriter()`,
`createColumnarShuffleReader()`,
and `isCelebornSkewShuffleOrChildShuffle()` methods to use the holder
pattern for
lazy initialization
5. Added JavaDoc comments explaining the lazy loading mechanism
### Why are the changes needed?
The current implementation statically initializes columnar shuffle class
constructors
and the skew shuffle method at SparkUtils class loading time, which means
these
classes/methods are loaded regardless of whether they are actually used.
This lazy loading approach ensures that:
- Columnar shuffle classes are only loaded when actually needed (when
`celeborn.columnarShuffle.enabled` is true and the create methods are
called)
- CelebornShuffleState class is only loaded when skew shuffle detection is
needed
- Reduces unnecessary class loading overhead for users not using these
features
- Improves startup performance and memory footprint
- Aligns with the conditional usage pattern already present in
SparkShuffleManager
The static holder pattern (initialization-on-demand holder idiom) provides
several
advantages:
- Thread-safe without explicit synchronization (guaranteed by JVM class
loading mechanism)
- No synchronization overhead at runtime (no volatile reads or lock
acquisition)
- Simpler and more concise code compared to double-checked locking
- Recommended by Effective Java (Item 83) for lazy initialization
### Does this PR resolve a correctness bug?
No, this is a performance optimization.
### Does this PR introduce any user-facing change?
No. This change only affects when certain classes are loaded internally.
The functionality and API remain unchanged.
### How was this patch tested?
- Code review to verify correct implementation of the
initialization-on-demand holder pattern
- Verified that JVM class loading guarantees thread safety (JLS ยง12.4.2)
- Analyzed existing columnar shuffle and skew shuffle test coverage in the
codebase
- The changes are backward compatible and don't alter functionality, only
initialization timing
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]