adnanhemani commented on PR #2962:
URL: https://github.com/apache/polaris/pull/2962#issuecomment-3500061653

   > I have some reservations regarding a common transformation layer for all 
data sinks. While the idea is sound in principle, I'm skeptical about reaching 
a consensus on the specific transformations to apply and information to redact.
   
   I completely agree with this actually - a common transformation layer for 
all data sinks is likely not possible. But there can (and should) be groupings 
of Event Listener implementations that follow similar transformations. i.e. AWS 
CloudWatch Logs, Google Cloud Logging, and Azure Monitor Logs (I've not begun 
work on the latter two but are future enhancements I am planning for) all have 
similar maximum record sizes, and we should safely remove the `TableMetadata` 
objects wherever they exist in an event for these sorts of listeners as a 
result.
   
   > Finally, I don't believe an intermediary Map representation offers much 
benefit.
   
   I don't think having an intermediary Map representation is a requirement by 
any means, but it seemed like the most straight-forward way to grab only the 
relevant information from each event - even if it is quite verbose. If there is 
a way to make this much cleaner using MixIns solely, given the concerns that 
exist, I am all for it.
   
   > I understand and share the security concerns, but we can't delay this PR 
indefinitely.
   
   While I understand the thought here, I cannot advocate/approve for this PR 
if it does not resolve the security concerns, at a minimum. We should not be 
introducing security concerns into our code base (regardless of "beta" status 
of a feature) in the name of simply moving things forward. Polaris has not done 
so in the past, and we should continue to apply the same bar for the future. cc 
@snazy who has pointed out many security concerns in the past.
   
   If you're still in support of merging the PR through without resolving the 
security concerns, let's start a thread on the ML to ensure the community is 
aware of this security gap, supports this decision, and acknowledges the 
precedence that this may inadvertently set for the future.
   
   > Manually addressing each event for transformations is incredibly 
time-consuming, as you've noted.
   
   That’s precisely why only one event was initially wired through in the 
original PRs. I’m not against adding more events — a few at a time — as long as 
they’re properly reviewed to ensure their payloads don’t introduce any issues. 
Even in that case, we should first outline a clear approach for handling events 
that may require additional filtering. Committing fully to a method without 
understanding how it will accommodate all scenarios wouldn’t be a prudent path 
forward.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to