kinghao007 opened a new issue, #5323:
URL: https://github.com/apache/linkis/issues/5323

   ### Search before asking
   
   - [x] I searched the [issues](https://github.com/apache/linkis/issues) and 
found no similar issues.
   
   ### Linkis Component
   
   linkis-engineconn-plugins
   
   ### What happened
   
   **English:**
   
   Linkis currently lacks native support for ClickHouse, the industry-leading 
ultra-high performance OLAP engine. ClickHouse is widely adopted by top-tier 
companies (ByteDance, Tencent, Alibaba, Meituan, JD.com) for real-time 
analytics, user behavior analysis, and log analysis due to its exceptional 
query performance (10-100x faster than traditional MPP databases).
   
   **Market Demand:**
   - **Performance Benchmark**: ClickHouse is recognized as the OLAP 
performance standard, with single-table queries 10-100x faster than traditional 
MPP databases
   - **Real-time Data Ingestion**: Supports millions of rows per second write 
throughput
   - **Columnar Storage**: 10:1 compression ratio, significantly reducing 
storage costs
   - **High Market Penetration**: Widely used in finance (BOC, CMB, Ping An), 
telecom (China Mobile, China Unicom), and internet sectors
   - **Active Community**: 34k+ GitHub stars, one of the fastest-growing OLAP 
databases
   
   **Strategic Value:**
   ClickHouse complements existing engines (Doris, Presto/Trino) by forming a 
golden triangle of performance-concurrency-flexibility:
   - **ClickHouse**: Best for single-table wide-table queries and extreme 
performance
   - **Doris**: Best for multi-dimensional analysis and BI reports
   - **Presto/Trino**: Best for data lake queries and federated queries
   
   ---
   
   **中文:**
   
   
Linkis目前缺乏对ClickHouse的原生支持,ClickHouse是业界领先的超高性能OLAP引擎。由于其卓越的查询性能(比传统MPP数据库快10-100倍),ClickHouse被头部公司(字节跳动、腾讯、阿里巴巴、美团、京东)广泛采用,用于实时分析、用户行为分析和日志分析。
   
   **市场需求:**
   - **性能标杆**: ClickHouse是公认的OLAP性能标准,单表查询性能比传统MPP数据库快10-100倍
   - **实时数据摄入**: 支持每秒数百万行数据写入
   - **列式存储**: 10:1压缩比,显著降低存储成本
   - **市场渗透率高**: 在金融(中国银行、招商银行、平安集团)、电信(中国移动、中国联通)和互联网行业广泛使用
   - **社区活跃**: GitHub 34k+ stars,增长最快的OLAP数据库之一
   
   **战略价值:**
   ClickHouse与现有引擎(Doris、Presto/Trino)互补,形成性能-并发-灵活性的黄金三角:
   - **ClickHouse**: 适合单表大宽表查询和极致性能
   - **Doris**: 适合多维分析和BI报表
   - **Presto/Trino**: 适合数据湖查询和联邦查询
   
   ### What you expected to happen
   
   **English:**
   
   Linkis should provide a ClickHouse engine plugin with the following 
capabilities:
   
   1. **SQL Query Support:**
      - Standard SQL syntax support
      - Distributed table and local table query support
      - MergeTree family table engine support
      - Materialized view support
   
   2. **Data Operations:**
      - INSERT operations for batch data loading
      - SELECT queries with complex aggregations
      - JOIN operations across tables
      - Support for ClickHouse-specific functions
   
   3. **Connection Management:**
      - Support for JDBC and HTTP protocols
      - Connection pooling and reuse
      - Support for distributed clusters
      - Authentication and authorization
   
   4. **Performance Optimization:**
      - Query result streaming to avoid OOM
      - Support for sampling queries
      - Query timeout and cancellation
      - Resource usage monitoring
   
   5. **Integration with Linkis:**
      - Unified task submission interface
      - Resource management integration
      - Permission control integration
      - Metadata management support
   
   ---
   
   **中文:**
   
   Linkis应该提供ClickHouse引擎插件,具备以下能力:
   
   1. **SQL查询支持:**
      - 标准SQL语法支持
      - 分布式表和本地表查询支持
      - MergeTree系列表引擎支持
      - 物化视图支持
   
   2. **数据操作:**
      - INSERT操作用于批量数据加载
      - 复杂聚合的SELECT查询
      - 跨表JOIN操作
      - ClickHouse特定函数支持
   
   3. **连接管理:**
      - 支持JDBC和HTTP协议
      - 连接池和复用
      - 支持分布式集群
      - 认证和授权
   
   4. **性能优化:**
      - 查询结果流式处理避免OOM
      - 支持采样查询
      - 查询超时和取消
      - 资源使用监控
   
   5. **与Linkis集成:**
      - 统一的任务提交接口
      - 资源管理集成
      - 权限控制集成
      - 元数据管理支持
   
   ### How to reproduce
   
   **English:**
   
   Current situation:
   1. Users need to manually set up ClickHouse JDBC connections
   2. No dedicated engine plugin for ClickHouse operations
   3. Cannot leverage Linkis's unified task submission and resource management
   4. Limited support for ClickHouse-specific features
   
   Use case example:
   ```sql
   -- User wants to query ClickHouse for real-time analytics
   SELECT
       toDate(event_time) as date,
       user_id,
       count() as event_count,
       uniq(session_id) as session_count
   FROM events_distributed
   WHERE event_time >= today() - 7
   GROUP BY date, user_id
   ORDER BY date DESC, event_count DESC
   LIMIT 1000;
   
   -- Advanced features like sampling not supported
   SELECT count() FROM large_table SAMPLE 0.1;
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to