INNOCENT-BOY opened a new pull request, #12138:
URL: https://github.com/apache/pinot/pull/12138
The Cisco WAP Storage Team, we found a step that needs optimization. One of
Pinot Clusters is unstable in a period of time. We firstly tried to find the
error log in pinot server log. But we see lots of unnecessary error log that is
transformer error log. During troubleshooting, we found the root cause:
1. Some table share the same kafka topic. Each table use filter config to
ingest data. Such as below config:
`"filterConfig": {
"filterFunction": "Groovy({featureName !=
\"wap_unified_monitor\"},featureName)"
},`
2. CompositeTransformer contains Transformers. Some records is already
marked as Skipped reocrd. But those records still need to tranfomer by
remaining transformers. I think this will bring misleading error log and
unnecessary compute. `Stream.of(new ExpressionTransformer(tableConfig,
schema), new FilterTransformer(tableConfig),
new SchemaConformingTransformer(tableConfig, schema), new
DataTypeTransformer(tableConfig, schema),
new TimeValidationTransformer(tableConfig, schema), new
SpecialValueTransformer(schema),
new NullValueTransformer(tableConfig, schema), new
SanitizationTransformer(schema)).filter(t -> !t.isNoOp())
.collect(Collectors.toList())`
So we add a patch to CompositeTransformer. Please pinot maintainer help us
to review this PR. Thanks in advance!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]