[I] [Draft] Doris Roadmap 2024 [doris]

via GitHub Wed, 31 Jan 2024 07:03:00 -0800


morningman opened a new issue, #30669:
URL: https://github.com/apache/doris/issues/30669


   
   [Roadmap 2023](https://github.com/apache/doris/issues/16392)
   [Roadmap 2022](https://github.com/apache/doris/issues/7502)
   
   ## Separation of Storage and Computation
   
   - Flexibility & Stateless
       - [ ] Stateless BE node
       - [ ] Stateless FE node
   - Load Isolation
       - [ ] Mutlt cluster support 
       - [ ] Read & write isolation
   - More storage support
     - [x] AWS S3
     - [x] Aliyun OSS 
     - [x] Tencent Cloud COS
     - [x] Huawei Cloud OBS
     - [ ] GCP
     - [ ] Azure
     - [ ] HDFS
   - Performance
     - [x] Optimized cache policy
     - [ ] Optimization for cold data querying
   - [x] Support data deletion
   - SLA
     - [ ] Upgrade BE with no impact
     - [ ] Upgrade FE with no impact
   - Reliability
     - [ ] Snapshot & Time travel
     - [ ] Enhanced backup & restore
   - [ ] Data sharing
   
   ## [Async Materialized 
View](https://doris.apache.org/docs/dev/query-acceleration/async-materialized-view)
   
   - Build materialized view
       - [x] Support full refresh
       - [x] Support partition level refresh
       - [x] Support building mv from olap table
       - [x] Support building mv from hive table
       - [ ] Support building mv from iceberg table
       - [ ] Support building mv from hudi table
       - [ ] Nested materialized view with DAG
       - [ ] Incremental building for external table with partition granularity
       - [ ] Support partition rollup
       - [ ] Support partition TTL
       - [ ] Support `REPLACE` operation
       - [ ] Support refresh materialized view by time range
   
   - [Transparent 
Rewriting](https://doris.apache.org/docs/dev/query-acceleration/async-materialized-view/query-async-materialized-view)
       - [x] Support aggregation and rollup
       - [x] Support join
       - [ ] Query Partial rewriting
       - [ ] Rewriting supports nested materialized view
   
   - Materialized view management
       - [ ] Materialized view recommendation
   
   ## Semi Structure Data Analysis
   
   - Inverted Index
       - [x] Support [Inverted 
Index](https://doris.apache.org/docs/dev/data-table/index/inverted-index/)
       - [ ] Merging index files
       - [ ] Working with separation of storage and computation
       - [ ] Speed up the data loading with inverted index
   
   - VARIANT data type
       - [ ] Support `VARIANT` data type
       - [ ] Working with inverted index
   
   ## Query Optimizer
   
   - Basic framework
      - [ ] Fully supports DQL, DML and DDL
      - [ ] Optimized memeory consumption
      - [ ] Optimized apply order of RBO rules
      - [ ] Improved efficiency of Cascades enumeration
   - Planning quality
      - Statistics
        - [ ] Support statistical for synced materialized views
        - [ ] Support partition level statistics collection
        - [ ] Supports histogram statistics collection
      - New distributed cost model
        - [ ] Optimized distributed cost model framework
        - [ ] Support runtime cost revaluation
        - [ ] Supports more accurate operator cost fitting models
      - Rules and enumerations
        - [ ] Expand RBO rules
        - [ ] Improve the quality of Cascades enumeration plan
        - [ ] Enhanced dphyper enumeration framework function, supports outer 
join enumeration and CDC
      - Enhance runtime filter adaptive capability
        - [ ] Adaptive runtime filter size
        - [ ] Adaptive runtime filter type
        - [ ] Adaptive runtime filter waiting time
      - [ ] Supports histogram-based data skew adaptive processing framework
   
   ## DataLake Analytics 
   
   - Support more file format
       - [ ] RCFile
       - [ ] SequenceFile
   
   - Support more lake format
       - [ ] Support Iceberg with ORC
       - [ ] Support Iceberg Equality Delete
       - [ ] Support more systable on Hudi
       - [ ] Support CDC scan on Hudi
       - [ ] Support more systable on Paimon
   
   - Trino Connector compatibility
       - [ ] Trino Connector compatibility framework
       - [ ] Support Trino DeltaLake Connector
       - [ ] Support Trino Bigquery Connector
       - [ ] Support Trino Cassandra Connector
   
   - Datalake write back
       - Hive
           - [ ] Support unpartitioned table
           - [ ] Support partitioned table
           - [ ] Support `INSERT OVERWRITE`
           - [ ] Support `INSERT`
       - Iceberg
           - [ ] Support unpartitioned table
           - [ ] Support partitioned table
           - [ ] Support update and delete
       - Hudi
       - Paimon
   
   - Enhanced JDBC Catalog
       - [ ] Support DB2
       - [ ] Support sharded database
       - [ ] Support query concurrency
   
   - Enhanced file analysis
       - [ ] Support insert into table value function
   
   - Enhanced file cache
       - [ ] Support memeory-level file cache
       - [ ] Enhanced cache statistic and hits analysis
   
   - Integrate with Apache Ranger
       - [x] Support Catalog/Database/Table/Resource/WorkloadGroup auth
       - [ ] Support row policy
       - [ ] Support data mask
       - [ ] Support column level privilege
   
   - [SQL dialect 
support](https://doris.apache.org/docs/dev/lakehouse/sql-dialect)
        - [x] Presto/Trino
        - [ ] Spark
        - [ ] Hive
        - [ ] Clickhouse
        - [ ] Oracel
   
   ## Query Processing
   
   - [ ] Support store precedure
   - Support Spill to disk
       - [ ] Sort Operator
       - [ ] Aggregate Operator
       - [ ] Join Operator
   - [ ] Working with shuffle service
   - [ ] Stage by stage query processing
   
   ## Storage Engine
   
   - Data Loading
       - [x] Support auto partiton when loading
       - [ ] Zero-ETL: Built-in data integration from OLTP CDC to Doris
       - [ ] Support transactional multi table `INSERT INTO`
       - [ ] Support `MERGE INTO`
   - Data Modeling
       - [x] Auto incremental column
       - [ ] Support `CLUSTER BY`
       - [ ] Support KEY column in arbitrary order
   - CCR(Cross cluster replication)
       - [ ] Support Master/Slave switch
       - [ ] Support cross region deployment
       - [ ] Work with separation of storage and computation
   - [ ] Support data binlog
   - [ ] Enhanced Z-order index
   - [ ] Optimized high-concurrency point query
   
   ## Ecosystem & Tools
   
   - Cluster Manager for Apache Doris
   - [x] Doris StreamLoader tool        
   - X2Doris
       - [x] Support Hive to Doris
       - [x] Support Doris to Doris
       - [x] Support Kudu to Doris
       - [x] Support StarRocks to Doris
       - [ ] Support Clickhouse to Doris
   - BI tools compatibility
       - [x] Superset
       - [ ] Metabase
       - [ ] Navicat
       - [ ] Datagrip
       - [ ] Dbeaver
       - [ ] SmartBI
       - [ ] FineBI
   - Data Integration
       - [ ] Kettle
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [Draft] Doris Roadmap 2024 [doris]

Reply via email to