Xiangdong Huang created IOTDB-1131:
--------------------------------------
Summary: dictionary encoding of deviceID and measurementID in WAL
Key: IOTDB-1131
URL: https://issues.apache.org/jira/browse/IOTDB-1131
Project: Apache IoTDB
Issue Type: Improvement
Components: WAL
Reporter: Xiangdong Huang
This is an interesting idea that proposed by Tian Jiang.
Copy from Tian Jiang:
Write ahead logs (WALs) ensure that data which are not persisted yet can still
be recovered from a system failure, thus to increase the durability of a DBMS.
However, WALs generally require more frequent flushes to limit the possibility
of losing data, which increases disk utilities significantly as each flush
requires one disk I/O. Moreover, logs are hardly compressed or encoded like
what we are doing to the raw data in TsFiles, and result is that logs
containing the same data consume much larger space than the data chunks. The
disadvantages are two-folds: first, large logs will compete for more disk
bandwidth, slowing down the persistence of raw data; second, even if WALs are
placed on another disk, (possibly SSD for high throughput), as WALs are removed
frequently once their corresponding data are persisted, such frequent
write-and-erases will shorten disk life especially for SSDs.
So it is beneficial to reduce the sizes of WALs. In IoTDB (and also other
DBMSs), the majority of WALs are logs of insertions, as other operations like
deletions and updates are often rare compared with insertions. This observation
enlightens us that may focus on reducing sizes of insertion logs, which is
enough to attain ideal improvement of the whole system. Currently, we serialize
complete physical plans into WAL, but we notice that despite values and
timestamps generally varies from plan to plan, head information like deviceIds,
measurementIds and data types are highly redundant, and sometimes deviceIds and
measurementIds are long strings, which may consume a significant amount of
space. So in this design, we concentrate on reducing duplicated deviceIds,
measurementIds and data types in WALs.
Method
To reduce duplicated deviceIds, measurementIds and data types in WALs, we use
windowed differentiation technique (or referencing) to replace redundant fields
with a index pointing to a base log, if such a log can be found within a given
window. Detailed procedure are described below:
--
This message was sent by Atlassian Jira
(v8.3.4#803005)