kbendick edited a comment on pull request #3326: URL: https://github.com/apache/iceberg/pull/3326#issuecomment-980423958
> Thanks for the change Harsh! I've been thinking a lot about this change and I like it but I think it also highlights a confusing or awkward part of the Spark configuration. `type` and `catalog-type` kinda mean the same thing and when to use which is probably not super obvious for an end user. I wonder if we could deprecate one of them (my vote would be to deprecate `type`) and compare the other against first the Enum you've defined below and if no matches then look up the class name via reflection. I too have some concern about restricting the ability for people to make additional catalogs, though I do like standardizing the configs with specific naming patterns for reasons I'll outline below. But TLDR: I agree with @rymurr that it seems we're heading further away from the direction of a general `catalog-impl`, which I think should be the foundation so that things are built in such a way that more options can be included. I think in time, as Iceberg grows to be a more common piece of people's architecture, we'll likely see more 3rd party connectors that can't be open sourced for whatever reason or where it wouldn't make sense to integrate them into the main repository due to licensing issues. It's imported to ensure these folks have an integration path for more widespread adoption. __Specific concerns__: As far as configs go, I also have some concerns the Flink catalog builder. The newer Flink connector API requires configuration options to be declared (both required and optional). It doesn't seem to allow the `*` option we presently use (last I checked at least). In order to make that work for catalogs that aren't part of Iceberg's distributed source, I would move further in the direction of standardizing on a configuration format that can be specified by `catalog-impl` and then allow that to hook in at runtime to Flinks configuration declaration, etc. If we continue to have to support `catalog-impl`, it should probably be the first class concept so that things like the above don't get missed for the base case. I really do appreciate the contribution @harshm-dev, and will admit for a while I felt this was the better way to go, but thinking on it more I'm in agreement with @rymurr and feel that our efforts would be better spent standardizing on ways for catalogs to propagate their set of configuration values to the right places in a standardized way. That said, I'd consider maybe waiting until the Thanksgiving holiday is over in the US for last thoughts from others @harshmdev. There might be use cases I'm missing where this makes sense and is a valuable addition to keeping `catalog-impl` (eg possibly with trino which does require more declared up front AFAIK or something else I'm not quite seeing). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
