Re: [PR] Optimize ProtoBufRecordExtractor with field descriptor caching [pinot]

via GitHub Thu, 29 Jan 2026 02:07:17 -0800


arunkumarucet commented on code in PR #17593:
URL: https://github.com/apache/pinot/pull/17593#discussion_r2740872785



##########
pinot-plugins/pinot-input-format/pinot-protobuf/src/main/java/org/apache/pinot/plugin/inputformat/protobuf/ProtoBufRecordExtractor.java:
##########
@@ -68,23 +124,32 @@ private Object getFieldValue(Descriptors.FieldDescriptor 
fieldDescriptor, Messag
   @Override
   public GenericRow extract(Message from, GenericRow to) {
     Descriptors.Descriptor descriptor = from.getDescriptorForType();
-    if (_extractAll) {
-      for (Descriptors.FieldDescriptor fieldDescriptor : 
descriptor.getFields()) {
-        Object fieldValue = getFieldValue(fieldDescriptor, from);
-        if (fieldValue != null) {
-          fieldValue = convert(new ProtoBufFieldInfo(fieldValue, 
fieldDescriptor));
-        }
-        to.putValue(fieldDescriptor.getName(), fieldValue);
-      }
-    } else {
-      for (String fieldName : _fields) {
-        Descriptors.FieldDescriptor fieldDescriptor = 
descriptor.findFieldByName(fieldName);
-        Object fieldValue = fieldDescriptor == null ? null : 
getFieldValue(fieldDescriptor, from);
+
+    // Initialize or reinitialize cache if descriptor changed (handles schema 
evolution)
+    if (_cachedDescriptorFullName == null || 
!_cachedDescriptorFullName.equals(descriptor.getFullName())) {

Review Comment:
   In the current implementation, this scenario doesn't actually occur. The 
ProtoBufMessageDecoder loads the descriptor from the .desc file once at 
initialization, and all messages are parsed using that same descriptor. So 
message.getDescriptorForType() returns the same descriptor for every message 
within a decoder instance.
   
   The fullName comparison is defensive coding that handles:
   1. Future extensibility (e.g., if schema registry support is added)
   2. Potential reuse of the extractor across different message types
   3. Re-initialization via init() with different field sets
   
   Since the decoder guarantees a single descriptor per instance, multi-version 
caching isn't needed. The check is cheap (string comparison) and provides 
safety without impacting the common case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Optimize ProtoBufRecordExtractor with field descriptor caching [pinot]

Reply via email to