arunkumarucet commented on code in PR #17593:
URL: https://github.com/apache/pinot/pull/17593#discussion_r2740872785
##########
pinot-plugins/pinot-input-format/pinot-protobuf/src/main/java/org/apache/pinot/plugin/inputformat/protobuf/ProtoBufRecordExtractor.java:
##########
@@ -68,23 +124,32 @@ private Object getFieldValue(Descriptors.FieldDescriptor
fieldDescriptor, Messag
@Override
public GenericRow extract(Message from, GenericRow to) {
Descriptors.Descriptor descriptor = from.getDescriptorForType();
- if (_extractAll) {
- for (Descriptors.FieldDescriptor fieldDescriptor :
descriptor.getFields()) {
- Object fieldValue = getFieldValue(fieldDescriptor, from);
- if (fieldValue != null) {
- fieldValue = convert(new ProtoBufFieldInfo(fieldValue,
fieldDescriptor));
- }
- to.putValue(fieldDescriptor.getName(), fieldValue);
- }
- } else {
- for (String fieldName : _fields) {
- Descriptors.FieldDescriptor fieldDescriptor =
descriptor.findFieldByName(fieldName);
- Object fieldValue = fieldDescriptor == null ? null :
getFieldValue(fieldDescriptor, from);
+
+ // Initialize or reinitialize cache if descriptor changed (handles schema
evolution)
+ if (_cachedDescriptorFullName == null ||
!_cachedDescriptorFullName.equals(descriptor.getFullName())) {
Review Comment:
In the current implementation, this scenario doesn't actually occur. The
ProtoBufMessageDecoder loads the descriptor from the .desc file once at
initialization, and all messages are parsed using that same descriptor. So
message.getDescriptorForType() returns the same descriptor for every message
within a decoder instance.
The fullName comparison is defensive coding that handles:
1. Future extensibility (e.g., if schema registry support is added)
2. Potential reuse of the extractor across different message types
3. Re-initialization via init() with different field sets
Since the decoder guarantees a single descriptor per instance, multi-version
caching isn't needed. The check is cheap (string comparison) and provides
safety without impacting the common case.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]