geserdugarov commented on code in PR #12796:
URL: https://github.com/apache/hudi/pull/12796#discussion_r1959542274
##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/transform/RowDataToHoodieFunction.java:
##########
@@ -89,29 +79,17 @@ public void open(Configuration parameters) throws Exception
{
this.keyGenerator =
HoodieAvroKeyGeneratorFactory
.createKeyGenerator(flinkConf2TypedProperties(this.config));
- this.payloadCreation = PayloadCreation.instance(config);
}
- @SuppressWarnings("unchecked")
@Override
- public O map(I i) throws Exception {
- return (O) toHoodieRecord(i);
- }
-
- /**
- * Converts the give record to a {@link HoodieRecord}.
- *
- * @param record The input record
- * @return HoodieRecord based on the configuration
- * @throws IOException if error occurs
- */
- @SuppressWarnings("rawtypes")
- private HoodieRecord toHoodieRecord(I record) throws Exception {
+ public O map(I record) throws Exception {
+ // [HUDI-8969] Analyze how to get rid of excessive conversions, should be
a subtask for RFC-88
Review Comment:
I propose to leave this part as it is in this PR.
There are two main problems with `RowDataKeyGen`:
1. it doesn't support `CustomAvroKeyGenerator`,
2. it doesn't allow to process case, when partition column is string, and we
set key generator as:
```text
'hoodie.datasource.write.keygenerator.class'='org.apache.hudi.keygen.TimestampBasedAvroKeyGenerator',
'hoodie.keygen.timebased.timestamp.type' = 'DATE_STRING',
'hoodie.keygen.timebased.input.dateformat' = 'yyyy-MM-dd HH:mm:ss',
'hoodie.keygen.timebased.output.dateformat' = 'yyyy-MM-dd'
```
Maybe those scenarios are not used by users. But I'm afraid that switch to
`RowDataKeyGen` could break some specific use cases.
I propose to review key generators in a separate task,
[HUDI-9042](https://issues.apache.org/jira/browse/HUDI-9042). This task should
be also useful for RFC-88. And any changes related to this task will be
localized in a small commit, which could be easily reverted.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]