skywalker0618 opened a new issue, #18397: URL: https://github.com/apache/hudi/issues/18397
### Task Description Within Uber, we use Hudi with parquet not orc. But we found despite the orc related functions are not called during runtime, hudi streaming source on 1.2 still has a dependency on orc package because: 1. HoodieSplitReaderFunction class implements serializable and has HoodieWriteConfig ([code](https://github.com/apache/hudi/blob/master/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/reader/function/HoodieSplitReaderFunction.java#L59C17-L59C34)) 2. This class eagerly creates this writerConfig in its constructor ([code](https://github.com/apache/hudi/blob/master/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/reader/function/HoodieSplitReaderFunction.java#L82)) 3. During flink job deployment, this class (HoodieSplitReaderFunction) is serialized and sent from JM to TM, which caused all non-transient members (including the writerConfig) being serialized. 4. The serializer uses reflection to look for if this class has readObject/writeObject functions. 5. The reflection needs all class definitions of function signatures of HoodieWriteConfig, which includes org.apache.orc.CompressionKind. Hence, even though the job is not using orc, if not including orc dependency, the job encountered this class-not-found failure: Caused by: java.lang.NoClassDefFoundError: org/apache/orc/CompressionKind at java.base/java.lang.Class.getDeclaredMethods0(Native Method) Proposed solution: 1. Change the HoodieWriterConfig member to be transient: "private transient HoodieWriteConfig writeConfig;" 2. Remove early construction from the constructor of class HoodieSplitReaderFunction. 3. Add lazy initialization of the writerConfig like this: private HoodieWriteConfig getOrCreateWriteConfig() { if (writeConfig == null) { writeConfig = FlinkWriteClients.getHoodieClientConfig(configuration); } return writeConfig; } 4. This solution is similar to what it does with hadoopConfig ([code](https://github.com/apache/hudi/blob/master/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/reader/function/HoodieSplitReaderFunction.java#L143)) ### Task Type Code improvement/refactoring ### Related Issues **Parent feature issue:** (if applicable ) **Related issues:** NOTE: Use `Relationships` button to add parent/blocking issues after issue is created. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
