rahil-c commented on code in PR #18304:
URL: https://github.com/apache/hudi/pull/18304#discussion_r2941634355
##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/HoodieSparkLanceWriter.java:
##########
@@ -100,34 +105,37 @@ public HoodieSparkLanceWriter(StoragePath file,
* @param sparkSchema Spark schema for the data
* @param taskContextSupplier Task context supplier for partition ID
* @param storage HoodieStorage instance
- * @throws IOException if writer initialization fails
*/
public HoodieSparkLanceWriter(StoragePath file,
StructType sparkSchema,
TaskContextSupplier taskContextSupplier,
- HoodieStorage storage) throws IOException {
- this(file, sparkSchema, null, taskContextSupplier, storage, false);
+ HoodieStorage storage) {
+ this(file, sparkSchema, null, taskContextSupplier, storage, false,
Option.empty());
}
@Override
public void writeRowWithMetadata(HoodieKey key, InternalRow row) throws
IOException {
if (populateMetaFields) {
UTF8String recordKey = UTF8String.fromString(key.getRecordKey());
updateRecordMetadata(row, recordKey, key.getPartitionPath(),
getWrittenRecordCount());
- super.write(row);
- } else {
- super.write(row);
}
+ bloomFilterWriteSupportOpt.ifPresent(bloomFilterWriteSupport -> {
+ UTF8String recordKey = UTF8String.fromString(key.getRecordKey());
Review Comment:
If we can just change the code to allocate the `recordKey` once in this
function would be ideal.
Currently we have this
```
if (populateMetaFields) {
UTF8String recordKey = UTF8String.fromString(key.getRecordKey()); //
allocation 1
updateRecordMetadata(row, recordKey, key.getPartitionPath(),
getWrittenRecordCount());
}
bloomFilterWriteSupportOpt.ifPresent(bloomFilterWriteSupport -> {
UTF8String recordKey = UTF8String.fromString(key.getRecordKey()); //
allocation 2 (duplicate!)
bloomFilterWriteSupport.addKey(recordKey);
});
```
So thinking we can move
```
UTF8String recordKey = UTF8String.fromString(key.getRecordKey());
```
to be defined once.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]