[
https://issues.apache.org/jira/browse/IOTDB-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272686#comment-17272686
]
Xiangdong Huang commented on IOTDB-1131:
----------------------------------------
see https://github.com/apache/iotdb/pull/1869 for the draft.
> dictionary encoding of deviceID and measurementID in WAL
> --------------------------------------------------------
>
> Key: IOTDB-1131
> URL: https://issues.apache.org/jira/browse/IOTDB-1131
> Project: Apache IoTDB
> Issue Type: Improvement
> Components: WAL
> Reporter: Xiangdong Huang
> Priority: Major
>
> This is an interesting idea that proposed by Tian Jiang.
> Copy from Tian Jiang:
> Write ahead logs (WALs) ensure that data which are not persisted yet can
> still be recovered from a system failure, thus to increase the durability of
> a DBMS. However, WALs generally require more frequent flushes to limit the
> possibility of losing data, which increases disk utilities significantly as
> each flush requires one disk I/O. Moreover, logs are hardly compressed or
> encoded like what we are doing to the raw data in TsFiles, and result is that
> logs containing the same data consume much larger space than the data chunks.
> The disadvantages are two-folds: first, large logs will compete for more disk
> bandwidth, slowing down the persistence of raw data; second, even if WALs are
> placed on another disk, (possibly SSD for high throughput), as WALs are
> removed frequently once their corresponding data are persisted, such frequent
> write-and-erases will shorten disk life especially for SSDs.
> So it is beneficial to reduce the sizes of WALs. In IoTDB (and also other
> DBMSs), the majority of WALs are logs of insertions, as other operations like
> deletions and updates are often rare compared with insertions. This
> observation enlightens us that may focus on reducing sizes of insertion logs,
> which is enough to attain ideal improvement of the whole system. Currently,
> we serialize complete physical plans into WAL, but we notice that despite
> values and timestamps generally varies from plan to plan, head information
> like deviceIds, measurementIds and data types are highly redundant, and
> sometimes deviceIds and measurementIds are long strings, which may consume a
> significant amount of space. So in this design, we concentrate on reducing
> duplicated deviceIds, measurementIds and data types in WALs.
> Method
> To reduce duplicated deviceIds, measurementIds and data types in WALs, we use
> windowed differentiation technique (or referencing) to replace redundant
> fields with a index pointing to a base log, if such a log can be found within
> a given window. Detailed procedure are described below:
--
This message was sent by Atlassian Jira
(v8.3.4#803005)