morningman opened a new issue #1797: Improve the load performance of large file
URL: https://github.com/apache/incubator-doris/issues/1797
 
 
   The current load process is:
   
   `Tablet Sink -> Tablet Channel Mgr -> Tablets Channel -> Delta Writer -> 
MemTable -> Flush to disk`
   
   In the path of` Tablets Channel -> DeltaWriter -> MemTable -> Flush to 
disk`, the following operations are performed:
   
   1. Insert tuple into different memtables according to tablet ID
   2. When the memtable size reaches the threshold, it is written to disk.
   
   The above operations are equivalent to single thread execution for a single 
load task.
   In fact, the insertion of memtable and the flush of memtable can be executed 
synchronously.
   Perform these operation in single thread prevents the insertion of memtable 
from being delayed due to slow disk writing.
   
   In the new implementation, each TabletsChannel will have a flush thread and 
a flush queue.
   All DeltaWriters in TabletsChannel will place the memtable in the flush queu 
is memtable is full, and create new memtable for incoming data.
   The flush thread asynchronously flushes the memtables in the queue in turn.
   
   This design can improve the performance of load large files.
   In single host testing, the time to load a 1GB text file is reduced from 48 
seconds to 29 seconds.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to