tibrewalpratik17 opened a new pull request, #13103: URL: https://github.com/apache/pinot/pull/13103
label: `ingestion` `feature` - At present, SanitizationTransformer just trims the string length based on configured schema max length. The process is very silent with no observability if the field got trimmed or an option to skip rather than trimming. This becomes critical if the field is treated for json-indexing as post-trimming it can lead to trimmed json / malformed json eventually stopping ingestion. - Here, we add a new table-ingestion config `_failOnTrimmedStringLength`. Default value is false to ensure backward compatibility. If configured to be `true`, in case of incoming-string-value exceeding the configured max length, it will skip the record and log accordingly. - Even if it's configured to not skip, we now add it to `INCOMPLETE_REALTIME_ROWS_CONSUMED` metric for better observability. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
