nsivabalan commented on code in PR #13614:
URL: https://github.com/apache/hudi/pull/13614#discussion_r2232101872
##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/KeyBasedFileGroupRecordBuffer.java:
##########
@@ -99,11 +99,9 @@ public void processDataBlock(HoodieDataBlock dataBlock,
Option<KeySpec> keySpecO
@Override
public void processNextDataRecord(BufferedRecord<T> record, Serializable
recordKey) throws IOException {
BufferedRecord<T> existingRecord = records.get(recordKey);
- Option<BufferedRecord<T>> bufferRecord = doProcessNextDataRecord(record,
existingRecord);
Review Comment:
so we can delete `doProcessNextDataRecord` method also right
##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/BufferedRecordSerializer.java:
##########
@@ -31,52 +31,81 @@
/**
* An implementation of {@link CustomSerializer} for {@link BufferedRecord}.
- *
*/
public class BufferedRecordSerializer<T> implements
CustomSerializer<BufferedRecord<T>> {
- public static final int KRYO_SERIALIZER_INITIAL_BUFFER_SIZE = 1048576;
- private final Kryo kryo;
- // Caching ByteArrayOutputStream to avoid recreating it for every operation
- private final ByteArrayOutputStream baos;
+ // Caching kryo serializer to avoid creating kryo instance for every serde
operation
+ private static final ThreadLocal<InternalSerializerInstance> SERIALIZER_REF =
Review Comment:
curious to understand the overhead here.
we will be instantiating just once per file group right? (even before this
patch). I see we use this in ExternalSpillableMap for the merged log records
and so its one instance per file group.
so, how much we might get a benefit in here.
and where do we access these across threads. Isn't Threadlocal.withInitial()
mainly helps w/ thread safety across threads and each thread gets its own local
copy.
can you shed some light please.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]