Jackie-Jiang commented on code in PR #16624:
URL: https://github.com/apache/pinot/pull/16624#discussion_r2283650928
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/ComplexTypeTransformer.java:
##########
@@ -180,6 +183,11 @@ public ComplexTypeTransformer build() {
}
}
+ @Override
+ public void withInputColumnsOfDownStreamTransformers(Collection<String>
columns) {
+ _fieldsNeededForDownstreamTransformers = new HashSet<>(columns);
Review Comment:
Shouldn't need to make a copy
##########
pinot-spi/src/main/java/org/apache/pinot/spi/recordtransformer/RecordTransformer.java:
##########
@@ -40,6 +40,11 @@ default Collection<String> getInputColumns() {
return List.of();
}
+ /// Provides hint to the transformer that which columns are required as
input across all the downstream transformers
+ /// in the TransformPipeline.
+ default void withInputColumnsOfDownStreamTransformers(Collection<String>
inputColumnsOfDownstream) {
Review Comment:
Directly takes a `Set<String>` as argument.
(minor) Suggest renaming it to `withInputColumnsForDownstreamTransformers`,
changing `of` to `for`, and treat downstream as one work
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/TransformPipeline.java:
##########
@@ -53,11 +53,14 @@ public TransformPipeline(String tableNameWithType,
List<RecordTransformer> trans
_tableNameWithType = tableNameWithType;
_transformers = transformers;
FilterTransformer filterTransformer = null;
- for (RecordTransformer recordTransformer : transformers) {
+ Set<String> cumulativeInputColumns = new HashSet<>();
Review Comment:
You can cache this, as the return of `getInputColumns()`
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/ComplexTypeTransformer.java:
##########
@@ -233,6 +241,14 @@ public List<GenericRow> transform(List<GenericRow>
records) {
}
}
}
+ if (_fieldsNeededForDownstreamTransformers != null) {
Review Comment:
Let's not add the column, instead of removing them in the end. Removing all
other columns are more expensive, and way more dangerous than just not adding
the unnested column when it is not included
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]