Zhijing Lu created COMDEV-511:
---------------------------------
Summary: [GSoC][Doris]Dictionary Encoding Acceleration
Key: COMDEV-511
URL: https://issues.apache.org/jira/browse/COMDEV-511
Project: Community Development
Issue Type: Task
Components: GSoC/Mentoring ideas
Reporter: Zhijing Lu
*Apache Doris*
Apache Doris is a real-time analytical database based on MPP architecture. As a
unified platform that supports multiple data processing scenarios, it ensures
high performance for low-latency and high-throughput queries, allows for easy
federated queries on data lakes, and supports various data ingestion methods.
{*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
{*}Github{*}: [https://github.com/apache/doris]
h3. *Background*
In Apache Doris, dictionary encoding is performed during data writing and
compaction. Dictionary encoding will be implemented on string data types by
default. The dictionary size of a column for one segment is 1M at most. The
dictionary encoding technology accelerates strings during queries, converting
them into INT, for example.
h3. *Task*
* Phase One: Get familiar with the implementation of Apache Doris dictionary
encoding; learning how Apache Doris dictionary encoding accelerates queries.
* Phase Two: Evaluate the effectiveness of full dictionary encoding and
figure out how to optimize memory in such a case.
h3. *Learning Material*
{*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
{*}Github{*}: [https://github.com/apache/doris]
h3. Mentor
* Mentor: Chen Zhang, Apache Doris Committer, [[email protected]
|mailto:[email protected]]
* Mentor: Zhijing Lu, Apache Doris Committer,
[[email protected]|mailto:[email protected]]
* Mailing List: [email protected]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]