amolsr commented on PR #25714: URL: https://github.com/apache/airflow/pull/25714#issuecomment-2773038342
1. Your implementation is quite similar to Anton Bryzgalov's older code: [airflow-clickhouse-plugin](https://github.com/bryzgaloff/airflow-clickhouse-plugin). While his code is well-suited for plugins, it seems somewhat out of place in a provider. > Airflow's plugin functionality predates the provider system, and many companies still use his code within an Airflow plugin. Additionally, the developer has registered his package to be listed under the Apache organization so the community can maintain it: [airflow-providers-clickhouse](https://pypi.org/project/airflow-providers-clickhouse/). 2. Calling self.get_conn() in the constructor is concerning since the connection is never explicitly closed. > ClickHouse, like Spark, performs better with a limited number of open connections. 3. The clickhouse-driver library supports DB API 2.0, so inheriting from DbApiHook and ClickHouseOperator provides many built-in functionalities. The insert_rows() method can be overridden using Cursor.executemany() or by raising an exception. cursor.executemany('INSERT INTO test (x) VALUES', [[200]]) > Executing bulk inserts in parallel can be resource-intensive for ClickHouse. The optimal approach is to insert between 10,000 and 100,000 rows in a single query for better performance. 4. Your ClickHouseHook provides very limited functionality, compare for example with ExasolHook (also an analytical database). > The airflow should start this providers as MVP. There are many companies using clickhouse and airflow together, it doesn't feels right that these two open source projects are not collaborating in terms of tech. 5. Your ClickHouseOperator does not provide the same capabilities (last result, many requests in template, xcom ). I would get these features (and more) just using SQLExecuteQueryOperator > Clickhouse is similar to the Sql in terms of query language but different in execution at the back end. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
