morningman opened a new issue #6521:
URL: https://github.com/apache/incubator-doris/issues/6521


   This is the minutes of the developer meeting. After each meeting, we will 
update this summary.
   Developers can use this summary to track the progress of related issues.
   
   -----------------
   
   1. Novice task
   
       The goal of the novice task is to bring in more developers to join the 
community building. For developers who are participating in open source 
contributions for the first time, they can choose one from the novice task 
list, which can help developers familiarize themselves with the submission 
process and feel the community-friendliness. We have already referred to mature 
projects such as Apache Pulsar and Apache DolphinScheduler. The novice task is 
currently being planned and will be released in the near future.
   
   2. SIG (Special Interest Group)
   
       Set up some special interest groups according to modules, classify 
pr/issues according to modules, and send them to the corresponding groups, so 
that it is more convenient to discuss related issues together, and people who 
are interested in certain modules can pay attention to the progress of 
community development.
   
       At present, Doris has actually established SIGs in several directions, 
including Doris Manager and vectorization. We will gradually open more SIGs in 
the future. Everyone is welcome to participate.
   
   3. Document construction
   
       At present, the comprehensiveness of the Doris documentation is somewhat 
lacking. Some are because they forgot to add the documentation during 
development, and some are because the documentation is not updated in time when 
the function is iterated. At the same time, there are some problems with the 
document format. We hope that the overall reconstruction can be carried out.
   
       Through the overall refactoring of the document, on the one hand, it can 
help Doris to improve the grammar manual. On the other hand, it is a relatively 
friendly task for novices. It can help everyone become familiar with Doris's 
functions as soon as possible, and it can also help everyone integrate into the 
community as soon as possible. We have created a new issue of a document 
example on Github, and built an empty framework for all documents, just fill in 
the content, and hope that more friends who want to join the community 
construction can participate in the document construction.
       
       #6336 
   
       This is also a kind of novice task.
   
   4. Regression testing
   
       Currently, for the development of this submitted PR, the community only 
provides unit test detection and a minimum test set containing more than 100 
cases to ensure code quality. Although Baidu has a complete regression test set 
for daily regression testing, it is temporarily not visible to external 
developers, so it is not conducive to community developers to conduct tests.
       
       In the follow-up, we will try to provide a set of regression testing 
framework, and can support developers to add and improve the case, so as to 
further ensure the quality of the code.
   
   ## Function development related
   
   1. Vectorized execution engine
   
       Through the transformation of vectorized operators, we want to 
significantly improve Doris' query performance. This work involves code 
refactoring of all operators and storage layers, and will be one of the key 
research and development directions of the community this year.
       
       At present, the first stage of related work has been completed, and the 
vectorized execution operation of a single table can be realized. The first 
version of this version is expected to be released in September. Follow-up work 
is also underway, and it is expected to meet with you in the Q4 quarter. At 
present, the SIG for this work has been established. If you are interested in 
participating, please contact us.
   
       #6238
   
   2. Doris Manager visual operation and maintenance monitoring platform
   
       Doris Manager is mainly positioned to access cluster monitoring and 
support some operation and maintenance operations, such as cluster deployment 
wizard, node management, rolling upgrade, online expansion and contraction, 
etc. Doris Manager is currently in the intensive development phase. In 
September, we will first release a version to the community. Welcome more 
community friends to join, especially front-end development students. Of 
course, everyone is welcome to feedback the problems in the operation and 
maintenance process. After abstracting productization and functionalization, we 
can add it to the function list of Doris Manager.
   
       [doris-manager 
branch](https://github.com/apache/incubator-doris/tree/doris-manager)
   
   3. New Query Optimizer
   
       The query optimizer is one of the most important components of Doris. 
The current query optimizer framework has some problems such as unclear 
hierarchical design and poor scalability. We hope that the first version of the 
operational query optimizer can be launched by the end of the year. At present, 
some framework design verification work and the development of peripheral 
related functions have been carried out, such as the collection of statistical 
information. Welcome students who have experience in research and development 
of query optimizers to contact us.
   
       Finally, I hope everyone will join in. First of all, I hope to collect a 
new name for New Optimizer, which can highlight a certain characteristic of New 
Optimizer, just like the fastest and most accurate ability to locate the best 
plan. At the same time, we sincerely invite people who have ideas about New 
Optimizer to cooperate. I hope to have a strong enthusiasm for optimizer 
technology. It is better if you understand Cascades theory or have other open 
source products such as Spark and Presto optimizer development experience. We 
have also prepared some relatively simple tasks for novices, and hope that 
everyone can participate.
   
       #6483
   
   4. Resource isolation
   
       Resource isolation is also a function that many users care about. For 
the database of MPP architecture, resource isolation is a headache, because the 
original intention of MPP architecture design is to use cluster resources as 
much as possible to process query tasks. If there are multiple tasks, it must 
be Resource preemption will occur.
   
       At present, we mainly do two parts of work. One is resource labeling. 
The storage and computing nodes in the Doris cluster are divided into resource 
label groups, so that the resources in the cluster can be divided and isolated 
at the node level. The second is the resource limit of a single query, which 
limits the CPU usage of a query on a single node through parameters, which is 
more suitable for scenarios where users run timed tasks and are not sensitive 
to delay. The two parts of the work have been developed and will soon be 
integrated into the community.
   
       #5902
       #6442
   
   5. Z-Order Indexing
   
       Doris's current data is sorted and stored according to the prefix 
column, so when the prefix query conditions are included, you can perform quick 
data search on the sorted data. However, if the query condition is not a prefix 
column, you cannot use the characteristics of data sorting for fast data 
search. After the investigation, we found that Z-Order Indexing can solve this 
problem, and it can have a good filtering effect in the Kanban type 
multi-column query scenario. At present, the algorithm has basically been 
developed, and related testing and verification work is in progress.
       
       At the same time, Z-Order may bring a certain write performance 
degradation. Although the current test results show that the performance impact 
is not significant, the test conclusions need to be further refined
   
       #6359 
   
   6. PreparedStatement
   
       Doris currently does not support Prepared Statement operations on the 
Server side. Prepared statement can effectively prevent SQL injection problems, 
and can reduce the overhead of repeated parsing of query statements in some 
scenarios.
       
       For the SQL injection problem, because the MySQL Driver of most 
languages ​​supports Prepared Statement operations on the Client side, it can 
solve most SQL injection problems.
       
       Regarding the Prepared Statement operation on the Server side, we will 
continue to investigate.
   
   7. InternalErrorCode
   
       At present, the ErrorCode in Doris is rather confusing, which is not 
conducive to program access and error judgment. In the future, we will sort out 
the error codes and form a more standardized error message display.
   
       #6357 
   
   8. Import performance optimization
   
       At present, the import performance of Doris still has a lot of room for 
optimization, especially in the generation of the memory structure and the 
optimization of the disk writing stage. This aspect needs further analysis.
   
       #6398 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to