Jackie-Jiang commented on code in PR #16624:
URL: https://github.com/apache/pinot/pull/16624#discussion_r2286377964
##########
pinot-spi/src/main/java/org/apache/pinot/spi/recordtransformer/RecordTransformer.java:
##########
@@ -40,6 +41,11 @@ default Collection<String> getInputColumns() {
return List.of();
}
+ /// Provides hint to the transformer that which columns are required as
input across all the downstream transformers
+ /// in the TransformPipeline.
+ default void withInputColumnsForDownstreamTransformers(Set<String>
inputColumnsOfDownstream) {
Review Comment:
(minor)
```suggestion
default void withInputColumnsForDownstreamTransformers(Set<String>
inputColumns) {
```
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/ComplexTypeTransformer.java:
##########
@@ -195,18 +207,20 @@ public List<GenericRow> transform(List<GenericRow>
records) {
flattenMap(record, columns);
transformedRecords.add(record);
} else {
- Map<String, Object> originalValues =
record.copy(_fieldsToUnnest).getFieldToValueMap();
+ Map<String, Object> originalValues =
record.copy(_fieldsToUnnestAndKeepOriginalValue).getFieldToValueMap();
Review Comment:
Check if it is empty before making the copy
##########
pinot-spi/src/main/java/org/apache/pinot/spi/recordtransformer/RecordTransformer.java:
##########
@@ -40,6 +41,11 @@ default Collection<String> getInputColumns() {
return List.of();
}
+ /// Provides hint to the transformer that which columns are required as
input across all the downstream transformers
+ /// in the TransformPipeline.
+ default void withInputColumnsForDownstreamTransformers(Set<String>
inputColumnsOfDownstream) {
Review Comment:
Let's also add some javadoc about the input columns can be modified
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/ComplexTypeTransformer.java:
##########
@@ -358,22 +378,30 @@ protected void flattenMap(GenericRow record, List<String>
columns) {
void renamePrefixes(GenericRow record) {
assert !_prefixesToRename.isEmpty();
List<String> fields = new
ArrayList<>(record.getFieldToValueMap().keySet());
+ for (String field : fields) {
+ String newName = renamePrefix(field);
+ if (newName.equals(field)) {
+ continue;
+ }
+ Object value = record.removeValue(field);
+ if (newName.isEmpty() || record.getValue(newName) != null) {
+ throw new RuntimeException(
+ String.format("Name conflict after attempting to rename field %s
to %s", field, newName));
+ }
+ record.putValue(newName, value);
+ }
+ }
+
+ private String renamePrefix(String field) {
for (Map.Entry<String, String> entry : _prefixesToRename.entrySet()) {
- for (String field : fields) {
- String prefix = entry.getKey();
+ String prefix = entry.getKey();
+ if (field.startsWith(prefix)) {
String replacementPrefix = entry.getValue();
- if (field.startsWith(prefix)) {
- Object value = record.removeValue(field);
- String remainingColumnName = field.substring(prefix.length());
- String newName = replacementPrefix + remainingColumnName;
- if (newName.isEmpty() || record.getValue(newName) != null) {
- throw new RuntimeException(
- String.format("Name conflict after attempting to rename field
%s to %s", field, newName));
- }
- record.putValue(newName, value);
- }
+ String remainingColumnName = field.substring(prefix.length());
+ return replacementPrefix + remainingColumnName;
}
}
+ return field;
Review Comment:
To reduce overhead, we can return `null` here when there is no prefix match
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/ComplexTypeTransformer.java:
##########
@@ -92,11 +95,12 @@ public class ComplexTypeTransformer implements
RecordTransformer {
public static final String DEFAULT_DELIMITER = ".";
public static final CollectionNotUnnestedToJson
DEFAULT_COLLECTION_TO_JSON_MODE =
CollectionNotUnnestedToJson.NON_PRIMITIVE;
- private final List<String> _fieldsToUnnest;
+ private final SortedSet<String> _fieldsToUnnest;
Review Comment:
We should be able to make both `_fieldsToUnnest` and
`_fieldsToUnnestAndKeepOriginalValue` as `List<String>` to reduce overhead when
processing the rows
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]