About the design and development of merge.

江天 Sun, 07 Jul 2019 23:54:36 -0700

Greetings,

Although the new storage engine is on-line, some old functionalities are lost, 
for example, merge. As you may have known, current storage engine consists of 
two kind of data files, sequential files and unsequential files. A data point 
of a specified timestamp, say, t0, may occur zero or one time in sequential 
files and zero or many times in unsequential files. During a query, data with 
the same timestamp are all read from the files and only the newest version (the 
latest inserted one) is returned as a result. This indicates that the out-dated 
data in old files may down-grade the query performance. Besides, keeping data 
in different files incurs more disk seeks in a query, which significantly hurts 
the performance.


To avoid those disadvantages of keeping data disorderly in different files, we 
introduce a process called merge (also called compaction in other LSM systems) 
to read and rewrite data in multiple time-overlapping files to a new file which 
preserves better time order and contains no duplicated data.

Providing an efficient way to make data more compact is no easy task. If you 
feel interested or have learned compaction in some LSM systems, please join the 
discussion in this thread and give us your precious advices.

Many thanks,

Tian Jiang

About the design and development of merge.

Reply via email to