kbendick edited a comment on pull request #3326:
URL: https://github.com/apache/iceberg/pull/3326#issuecomment-980423958


   > Thanks for the change Harsh! I've been thinking a lot about this change 
and I like it but I think it also highlights a confusing or awkward part of the 
Spark configuration. `type` and `catalog-type` kinda mean the same thing and 
when to use which is probably not super obvious for an end user. I wonder if we 
could deprecate one of them (my vote would be to deprecate `type`) and compare 
the other against first the Enum you've defined below and if no matches then 
look up the class name via reflection.
   
   I too have some concern about restricting the ability for people to make 
additional catalogs, though I do like standardizing the configs with specific 
naming patterns for reasons I'll outline below.
   
   But TLDR: I agree with @rymurr that it seems we're heading further away from 
the direction of a general `catalog-impl`, which I think should be the 
foundation so that things are built in such a way that more options can be 
included.
   
   I think in time, as Iceberg grows to be a more common piece of people's 
architecture, we'll likely see more 3rd party connectors that can't be open 
sourced for whatever reason or where it wouldn't make sense to integrate them 
into the main repository due to licensing issues. It's imported to ensure these 
folks have an integration path for more widespread adoption.
   
   I really do appreciate the contribution @harshm-dev, and will admit for a 
while I felt this was the better way to go, but thinking on it more I'm in 
agreement with @rymurr and feel that this might be steering us away from the 
general case of `catalog-impl` (notes below on an example).
   
   But I'd consider waiting until the Thanksgiving holiday is over in the US 
(so at least next week) for last thoughts from others before closing this PR 
@harshmdev. There might be use cases I'm missing where this makes sense and is 
a valuable addition to keeping `catalog-impl` (eg possibly with trino which 
does require more declared up front AFAIK or something else I'm not quite 
seeing).
   
   
   __**Specific concerns** (specific argument for general `catalog-impl` - feel 
free to skip)__:
   
   As far as configs go, I have had some concerns with the Flink catalog 
builder API. I'll grab the specific classes and open an issue if need be.
   
   The newer Flink connector API requires configuration options to be declared 
(both required and optional). We currently implement against the deprecated 
interface but long term we'll have to move off. The new API doesn't seem to 
allow the `*` option we presently use (last I checked at least). In order to 
make specific catalog configs work for catalogs that aren't part of Iceberg's 
distributed source / aren't "well known types", I currently feel we should move 
further in the direction of standardizing on a configuration format that can be 
specified by `catalog-impl` and then allow that to hook in at runtime to Flinks 
configuration declaration, etc. I'm very open to alternative arguments though.
   
   If we continue to have to support `catalog-impl`, it should probably be the 
first class concept so that things like the above don't get missed for the base 
case.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to