sachebotarev commented on PR #25714:
URL: https://github.com/apache/airflow/pull/25714#issuecomment-2773650489
1. Your implementation is quite similar to Anton Bryzgalov's older code:
[airflow-clickhouse-plugin](https://github.com/bryzgaloff/airflow-clickhouse-plugin).
While his code is well-suited for plugins, it seems somewhat out of place in a
provider.
> Airflow's plugin functionality predates the provider system, and many
companies still use his code within an Airflow plugin. Additionally, the
developer has registered his package to be listed under the Apache organization
so the community can maintain it:
[airflow-providers-clickhouse](https://pypi.org/project/airflow-providers-clickhouse/).
>> Sorry, but the code of this plugin is bad, I don’t think anyone would be
interested in maintaining it.
>
2. Calling self.get_conn() in the constructor is concerning since the
connection is never explicitly closed.
> ClickHouse, like Spark, performs better with a limited number of open
connections.
>> So, now it’s not necessary to close connections?
> 3. The clickhouse-driver library supports DB API 2.0, so inheriting from
DbApiHook and ClickHouseOperator provides many built-in functionalities. The
insert_rows() method can be overridden using Cursor.executemany() or by raising
an exception. cursor.executemany('INSERT INTO test (x) VALUES', [[200]])
> Executing bulk inserts in parallel can be resource-intensive for
ClickHouse. The optimal approach is to insert between 10,000 and 100,000 rows
in a single query for better performance.
> > And how does this contradict what I wrote?
4. Your ClickHouseHook provides very limited functionality, compare for
example with ExasolHook (also an analytical database).
> The airflow should start this providers as MVP. There are many companies
using clickhouse and airflow together, it doesn't feels right that these two
open source projects are not collaborating in terms of tech.
>> The plugin is frankly bad even as an MVP, I don’t understand why anyone
would cling to it.
5. Your ClickHouseOperator does not provide the same capabilities (last
result, many requests in template, xcom ).
I would get these features (and more) just using SQLExecuteQueryOperator
> Clickhouse is similar to the Sql in terms of query language but different
in execution at the back end.
>>At the internal level of the DBMS, but not the plugin.
If you’re interested, two years ago, when I wrote this comment, I created my
own implementation for the company I was working for.
[clickhouse_hook.txt](https://github.com/user-attachments/files/19575301/clickhouse_hook.txt)
[clickhouse_operator.txt](https://github.com/user-attachments/files/19575307/clickhouse_operator.txt)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]