morningman opened a new issue #1797: Improve the load performance of large file URL: https://github.com/apache/incubator-doris/issues/1797 The current load process is: `Tablet Sink -> Tablet Channel Mgr -> Tablets Channel -> Delta Writer -> MemTable -> Flush to disk` In the path of` Tablets Channel -> DeltaWriter -> MemTable -> Flush to disk`, the following operations are performed: 1. Insert tuple into different memtables according to tablet ID 2. When the memtable size reaches the threshold, it is written to disk. The above operations are equivalent to single thread execution for a single load task. In fact, the insertion of memtable and the flush of memtable can be executed synchronously. Perform these operation in single thread prevents the insertion of memtable from being delayed due to slow disk writing. In the new implementation, each TabletsChannel will have a flush thread and a flush queue. All DeltaWriters in TabletsChannel will place the memtable in the flush queu is memtable is full, and create new memtable for incoming data. The flush thread asynchronously flushes the memtables in the queue in turn. This design can improve the performance of load large files. In single host testing, the time to load a 1GB text file is reduced from 48 seconds to 29 seconds.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
