Greetings, Although the new storage engine is on-line, some old functionalities are lost, for example, merge. As you may have known, current storage engine consists of two kind of data files, sequential files and unsequential files. A data point of a specified timestamp, say, t0, may occur zero or one time in sequential files and zero or many times in unsequential files. During a query, data with the same timestamp are all read from the files and only the newest version (the latest inserted one) is returned as a result. This indicates that the out-dated data in old files may down-grade the query performance. Besides, keeping data in different files incurs more disk seeks in a query, which significantly hurts the performance.
To avoid those disadvantages of keeping data disorderly in different files, we introduce a process called merge (also called compaction in other LSM systems) to read and rewrite data in multiple time-overlapping files to a new file which preserves better time order and contains no duplicated data. Providing an efficient way to make data more compact is no easy task. If you feel interested or have learned compaction in some LSM systems, please join the discussion in this thread and give us your precious advices. Many thanks, Tian Jiang