potiuk edited a comment on issue #20709:
URL: https://github.com/apache/airflow/issues/20709#issuecomment-1008582826
The first one we had initially and the problem was that some of the optional
features (plyvel was one example) from the provider will also throw the "No
module" errors (when Plyvel Hook is imported). They are essentially
indistinguishable from "random" `no import` errors - similarly as you
experienced with Hive (and if we convert them into errors, they will also flood
the logs. we have to remmber that Providers's manager is initialized and hooks
get imported by the webserver every time gunicorn restarts.
So we really need to hav some way to distinguish the "known import errors"
that we should essentially ignore from "unknow module errors" that we should
flag as problem.
I think we must have a way to make Provider's developers to flag certain
features as optional other than by extras (as is in the case of Plyvel now).
Currently Plyvel dependencies are only installed if someone did one of the two:
```
pip install apache-airflow-providers-google[leveldb]
pip install apache-airflow[leveldb]
````
(or manually installed plyvel as dependency)
The library that needs to be installed in order to get LevelDB "work" is
`plyvel'. So what we really need to do is find a way that we know that if
"plyvel" import fails when we are trying to import "leveldB" Hook, it should be
ignored as this optional Hook is not expected in this case.
The Different Exception is a nice idea, but the problem with it is that we
cannot use it in providers if we want to keep Airflow 2.1 compatibility
(because we will not have that exception in Airflow 2.1).
I thought about it and the best thing I came up with is to add a new feature
in `provider.yaml/provider_info` to mark certain Hooks as optional. I think
that might make most sense:
```
connection-types:
- hook-class-name:
airflow.providers.google.common.hooks.base_google.GoogleBaseHook
connection-type: google_cloud_platform
- hook-class-name:
airflow.providers.google.cloud.hooks.dataprep.GoogleDataprepHook
connection-type: dataprep
- hook-class-name:
airflow.providers.google.cloud.hooks.cloud_sql.CloudSQLHook
connection-type: gcpcloudsql
- hook-class-name:
airflow.providers.google.cloud.hooks.cloud_sql.CloudSQLDatabaseHook
connection-type: gcpcloudsqldb
- hook-class-name:
airflow.providers.google.cloud.hooks.bigquery.BigQueryHook
connection-type: gcpbigquery
- hook-class-name:
airflow.providers.google.cloud.hooks.compute_ssh.ComputeEngineSSHHook
connection-type: gcpssh
- hook-class-name:
airflow.providers.google.leveldb.hooks.leveldb.LevelDBHook
connection-type: leveldb
optional-when-import-error:
- "plyvel"
```
This would allow us to ignore (debug) specific import errors for specific
hooks and at the same time we will not have to "hard-code" known import errors.
I think that solves all the problems.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]