potiuk edited a comment on issue #20709:
URL: https://github.com/apache/airflow/issues/20709#issuecomment-1008582826


   The first one we had initially and the problem was that some of the optional 
features (plyvel was one example) from the provider will also throw the "No 
module" errors (when Plyvel Hook is imported). They are essentially 
indistinguishable from "random" `no import` errors - similarly as you 
experienced with Hive (and if we convert them into errors, they will also flood 
the logs. we have to remmber that Providers's manager is initialized and hooks 
get imported by the webserver every time gunicorn restarts. 
   
   So we really need to hav some way to distinguish the "known import errors" 
that we should essentially ignore from "unknow module errors" that we should 
flag as problem.
   
   I think we must have a way to make Provider's developers to flag certain 
features as optional other than by extras (as is in the case of Plyvel now). 
Currently Plyvel dependencies are only installed if someone did one of the two:
   
   ```
   pip install apache-airflow-providers-google[leveldb]
   pip install apache-airflow[leveldb]
   ````
   (or manually installed plyvel as dependency)
   
   The library that needs to be installed in order to get LevelDB "work" is 
`plyvel'. So what we really need to do is find a way that we know that if 
"plyvel" import fails when we are trying to import "leveldB" Hook, it should be 
ignored as this optional Hook is not expected in this case. 
   
   The Different Exception is a nice idea, but the problem with it is that we 
cannot use it in providers if we want to keep Airflow 2.1 compatibility 
(because we will not have that exception in Airflow 2.1).
   
   I thought about it and the best thing I came up with is to add a new feature 
in `provider.yaml/provider_info` to mark certain Hooks as optional. I think 
that might make most sense:
   
   ```
   connection-types:
     - hook-class-name: 
airflow.providers.google.common.hooks.base_google.GoogleBaseHook
       connection-type: google_cloud_platform
     - hook-class-name: 
airflow.providers.google.cloud.hooks.dataprep.GoogleDataprepHook
       connection-type: dataprep
     - hook-class-name: 
airflow.providers.google.cloud.hooks.cloud_sql.CloudSQLHook
       connection-type: gcpcloudsql
     - hook-class-name: 
airflow.providers.google.cloud.hooks.cloud_sql.CloudSQLDatabaseHook
       connection-type: gcpcloudsqldb
     - hook-class-name: 
airflow.providers.google.cloud.hooks.bigquery.BigQueryHook
       connection-type: gcpbigquery
     - hook-class-name: 
airflow.providers.google.cloud.hooks.compute_ssh.ComputeEngineSSHHook
       connection-type: gcpssh
     - hook-class-name: 
airflow.providers.google.leveldb.hooks.leveldb.LevelDBHook
       connection-type: leveldb
       optional-when-import-error: 
           - "plyvel"
   ```
   
   This would allow us to ignore (debug) specific import errors for specific 
hooks and at the same time we will not have to "hard-code" known import errors. 
I think that solves all the problems.
   
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to