morningman opened a new issue #6521:
URL: https://github.com/apache/incubator-doris/issues/6521
This is the minutes of the developer meeting. After each meeting, we will
update this summary.
Developers can use this summary to track the progress of related issues.
-----------------
1. Novice task
The goal of the novice task is to bring in more developers to join the
community building. For developers who are participating in open source
contributions for the first time, they can choose one from the novice task
list, which can help developers familiarize themselves with the submission
process and feel the community-friendliness. We have already referred to mature
projects such as Apache Pulsar and Apache DolphinScheduler. The novice task is
currently being planned and will be released in the near future.
2. SIG (Special Interest Group)
Set up some special interest groups according to modules, classify
pr/issues according to modules, and send them to the corresponding groups, so
that it is more convenient to discuss related issues together, and people who
are interested in certain modules can pay attention to the progress of
community development.
At present, Doris has actually established SIGs in several directions,
including Doris Manager and vectorization. We will gradually open more SIGs in
the future. Everyone is welcome to participate.
3. Document construction
At present, the comprehensiveness of the Doris documentation is somewhat
lacking. Some are because they forgot to add the documentation during
development, and some are because the documentation is not updated in time when
the function is iterated. At the same time, there are some problems with the
document format. We hope that the overall reconstruction can be carried out.
Through the overall refactoring of the document, on the one hand, it can
help Doris to improve the grammar manual. On the other hand, it is a relatively
friendly task for novices. It can help everyone become familiar with Doris's
functions as soon as possible, and it can also help everyone integrate into the
community as soon as possible. We have created a new issue of a document
example on Github, and built an empty framework for all documents, just fill in
the content, and hope that more friends who want to join the community
construction can participate in the document construction.
#6336
This is also a kind of novice task.
4. Regression testing
Currently, for the development of this submitted PR, the community only
provides unit test detection and a minimum test set containing more than 100
cases to ensure code quality. Although Baidu has a complete regression test set
for daily regression testing, it is temporarily not visible to external
developers, so it is not conducive to community developers to conduct tests.
In the follow-up, we will try to provide a set of regression testing
framework, and can support developers to add and improve the case, so as to
further ensure the quality of the code.
## Function development related
1. Vectorized execution engine
Through the transformation of vectorized operators, we want to
significantly improve Doris' query performance. This work involves code
refactoring of all operators and storage layers, and will be one of the key
research and development directions of the community this year.
At present, the first stage of related work has been completed, and the
vectorized execution operation of a single table can be realized. The first
version of this version is expected to be released in September. Follow-up work
is also underway, and it is expected to meet with you in the Q4 quarter. At
present, the SIG for this work has been established. If you are interested in
participating, please contact us.
#6238
2. Doris Manager visual operation and maintenance monitoring platform
Doris Manager is mainly positioned to access cluster monitoring and
support some operation and maintenance operations, such as cluster deployment
wizard, node management, rolling upgrade, online expansion and contraction,
etc. Doris Manager is currently in the intensive development phase. In
September, we will first release a version to the community. Welcome more
community friends to join, especially front-end development students. Of
course, everyone is welcome to feedback the problems in the operation and
maintenance process. After abstracting productization and functionalization, we
can add it to the function list of Doris Manager.
[doris-manager
branch](https://github.com/apache/incubator-doris/tree/doris-manager)
3. New Query Optimizer
The query optimizer is one of the most important components of Doris.
The current query optimizer framework has some problems such as unclear
hierarchical design and poor scalability. We hope that the first version of the
operational query optimizer can be launched by the end of the year. At present,
some framework design verification work and the development of peripheral
related functions have been carried out, such as the collection of statistical
information. Welcome students who have experience in research and development
of query optimizers to contact us.
Finally, I hope everyone will join in. First of all, I hope to collect a
new name for New Optimizer, which can highlight a certain characteristic of New
Optimizer, just like the fastest and most accurate ability to locate the best
plan. At the same time, we sincerely invite people who have ideas about New
Optimizer to cooperate. I hope to have a strong enthusiasm for optimizer
technology. It is better if you understand Cascades theory or have other open
source products such as Spark and Presto optimizer development experience. We
have also prepared some relatively simple tasks for novices, and hope that
everyone can participate.
#6483
4. Resource isolation
Resource isolation is also a function that many users care about. For
the database of MPP architecture, resource isolation is a headache, because the
original intention of MPP architecture design is to use cluster resources as
much as possible to process query tasks. If there are multiple tasks, it must
be Resource preemption will occur.
At present, we mainly do two parts of work. One is resource labeling.
The storage and computing nodes in the Doris cluster are divided into resource
label groups, so that the resources in the cluster can be divided and isolated
at the node level. The second is the resource limit of a single query, which
limits the CPU usage of a query on a single node through parameters, which is
more suitable for scenarios where users run timed tasks and are not sensitive
to delay. The two parts of the work have been developed and will soon be
integrated into the community.
#5902
#6442
5. Z-Order Indexing
Doris's current data is sorted and stored according to the prefix
column, so when the prefix query conditions are included, you can perform quick
data search on the sorted data. However, if the query condition is not a prefix
column, you cannot use the characteristics of data sorting for fast data
search. After the investigation, we found that Z-Order Indexing can solve this
problem, and it can have a good filtering effect in the Kanban type
multi-column query scenario. At present, the algorithm has basically been
developed, and related testing and verification work is in progress.
At the same time, Z-Order may bring a certain write performance
degradation. Although the current test results show that the performance impact
is not significant, the test conclusions need to be further refined
#6359
6. PreparedStatement
Doris currently does not support Prepared Statement operations on the
Server side. Prepared statement can effectively prevent SQL injection problems,
and can reduce the overhead of repeated parsing of query statements in some
scenarios.
For the SQL injection problem, because the MySQL Driver of most
languages supports Prepared Statement operations on the Client side, it can
solve most SQL injection problems.
Regarding the Prepared Statement operation on the Server side, we will
continue to investigate.
7. InternalErrorCode
At present, the ErrorCode in Doris is rather confusing, which is not
conducive to program access and error judgment. In the future, we will sort out
the error codes and form a more standardized error message display.
#6357
8. Import performance optimization
At present, the import performance of Doris still has a lot of room for
optimization, especially in the generation of the memory structure and the
optimization of the disk writing stage. This aspect needs further analysis.
#6398
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]