amogh-jahagirdar commented on PR #4925: URL: https://github.com/apache/iceberg/pull/4925#issuecomment-1250178504
Follow up from our discussion in yesterday's sync. ## Create View Semantics I think we'll probably need to better define what createView exactly entails. There are 2 cases I can think of that should be handled in createView. CreateView can either: 1.) Create a completely new view with an initial set of representations for the view. i.e. the view is not even defined in the catalog OR 2.) Add new representation(s) to an existing view if and only if the new representations or sql dialects don't already exist in the current view (if they exist this means that a user must replace the view). This case should still increment the view version. This could be a separate API AddViewRepresentation but this puts more work on a caller to identify if a given view representation or dialect already exists in a view which does not seem like the right behavior. ## Schema Validation of different representations There's another issue here for if and how does CreateView validate that the representations all have the expected schema for case 2? Considering a view represents a result of a computation there must be a single schema. Following this, ideally CreateView should validate that the schema is the same for the representations. This is challenging considering different representations and SQL dialects. In the long run, I think projects like Coral or Substrait would be used to help perform the schema validation across the different SQL dialects. As an extension, even there could be a Substrait View Representation down the line. @wmoustafa @jacques-n Any recommendations on a good way for handling this schema validation within the view format itself? I'm taking a look in Coral and Substrait (apologies if this is a naive question) For addressing schema validation in the short term, I think in createView API engines should pass in the schema for the dialects. Then createView performs a validation that the passed in schema aligns with the view schema. This could be bypassed by an engine just passing in the view.schema() even though it doesn't actually align with the view definition. So we could Happy to get thoughts and suggestions on this! @jzhuge @danielcweek @jackye1995 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
