adnanhemani commented on PR #2962: URL: https://github.com/apache/polaris/pull/2962#issuecomment-3500061653
> I have some reservations regarding a common transformation layer for all data sinks. While the idea is sound in principle, I'm skeptical about reaching a consensus on the specific transformations to apply and information to redact. I completely agree with this actually - a common transformation layer for all data sinks is likely not possible. But there can (and should) be groupings of Event Listener implementations that follow similar transformations. i.e. AWS CloudWatch Logs, Google Cloud Logging, and Azure Monitor Logs (I've not begun work on the latter two but are future enhancements I am planning for) all have similar maximum record sizes, and we should safely remove the `TableMetadata` objects wherever they exist in an event for these sorts of listeners as a result. > Finally, I don't believe an intermediary Map representation offers much benefit. I don't think having an intermediary Map representation is a requirement by any means, but it seemed like the most straight-forward way to grab only the relevant information from each event - even if it is quite verbose. If there is a way to make this much cleaner using MixIns solely, given the concerns that exist, I am all for it. > I understand and share the security concerns, but we can't delay this PR indefinitely. While I understand the thought here, I cannot advocate/approve for this PR if it does not resolve the security concerns, at a minimum. We should not be introducing security concerns into our code base (regardless of "beta" status of a feature) in the name of simply moving things forward. Polaris has not done so in the past, and we should continue to apply the same bar for the future. cc @snazy who has pointed out many security concerns in the past. If you're still in support of merging the PR through without resolving the security concerns, let's start a thread on the ML to ensure the community is aware of this security gap, supports this decision, and acknowledges the precedence that this may inadvertently set for the future. > Manually addressing each event for transformations is incredibly time-consuming, as you've noted. That’s precisely why only one event was initially wired through in the original PRs. I’m not against adding more events — a few at a time — as long as they’re properly reviewed to ensure their payloads don’t introduce any issues. Even in that case, we should first outline a clear approach for handling events that may require additional filtering. Committing fully to a method without understanding how it will accommodate all scenarios wouldn’t be a prudent path forward. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
