bolkedebruin commented on PR #35598:
URL: https://github.com/apache/airflow/pull/35598#issuecomment-1808019785
Oh I like your thinking. On the Object Store note I think you have a point
to a certain degree. What I would like to do is to move
`airflow.io.store.path.ObjectStorePath` to `airflow.io.ObjectStorePath` and
possibly move `airflow.io.store` to `airflow.io.cloud` or `airflow.io.Store`
(notice the capital S) - up for suggestions. The naming was obviously inspired
what cloud providers call their object storage, well... object storage.
I think in general while Object Storage is kind of file based it does not
confuse people as some database engines can make use of tabular data on object
storage (think spark, duckdb etc). I don't think people really expect `db =
ObjectStore(database=xxx)` to work (what would be expected?), but more like
(pseudo) `db = spark.connect(ObjectStoragePath(xxx))` or `db =
spark.connect(Catalog(xxx))`. But maybe I am wrong?
Thinking about the information architecture in Airflow and given the rise of
Catalogs like Iceberg's REST catalog and Unity Catalog wdyt about:
```
ObjectStoragePath
Catalog
|
-> load_catalog(name=xxx)
--> get_table (Iceberg, Unity, or ? )
--> get_connection() -> DBConnection (SQLAlchemy?)
--> get_raw_connection() -> Connection
--> get_path() -> ObjectStoragePath
```
This way you wouldn't need to know the dialect as a user / author. You would
use the catalog name that is registered within Airflow as a Connection.
P.S. If we are talking databases I would rename `protocol` to `dialect`.
Also, I'm not overly charmed by using `Database.default` I would just use
`Database(dialect=xxx)` to allow for default connections (as does
ObjectStoragePath).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]