Xiangdong Huang created IOTDB-1131:
--------------------------------------

             Summary: dictionary encoding of deviceID and measurementID in WAL
                 Key: IOTDB-1131
                 URL: https://issues.apache.org/jira/browse/IOTDB-1131
             Project: Apache IoTDB
          Issue Type: Improvement
          Components: WAL
            Reporter: Xiangdong Huang


This is an interesting idea that proposed by Tian Jiang.

Copy from Tian Jiang:

Write ahead logs (WALs) ensure that data which are not persisted yet can still 
be recovered from a system failure, thus to increase the durability of a DBMS. 
However, WALs generally require more frequent flushes to limit the possibility 
of losing data, which increases disk utilities significantly as each flush 
requires one disk I/O. Moreover, logs are hardly compressed or encoded like 
what we are doing to the raw data in TsFiles, and result is that logs 
containing the same data consume much larger space than the data chunks. The 
disadvantages are two-folds: first, large logs will compete for more disk 
bandwidth, slowing down the persistence of raw data; second, even if WALs are 
placed on another disk, (possibly SSD for high throughput), as WALs are removed 
frequently once their corresponding data are persisted, such frequent 
write-and-erases will shorten disk life especially for SSDs.

So it is beneficial to reduce the sizes of WALs. In IoTDB (and also other 
DBMSs), the majority of WALs are logs of insertions, as other operations like 
deletions and updates are often rare compared with insertions. This observation 
enlightens us that may focus on reducing sizes of insertion logs, which is 
enough to attain ideal improvement of the whole system. Currently, we serialize 
complete physical plans into WAL, but we notice that despite values and 
timestamps generally varies from plan to plan, head information like deviceIds, 
measurementIds and data types are highly redundant, and sometimes deviceIds and 
measurementIds are long strings, which may consume a significant amount of 
space. So in this design, we concentrate on reducing duplicated deviceIds, 
measurementIds and data types in WALs.

Method
To reduce duplicated deviceIds, measurementIds and data types in WALs, we use 
windowed differentiation technique (or referencing) to replace redundant fields 
with a index pointing to a base log, if such a log can be found within a given 
window. Detailed procedure are described below:





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to