[ 
https://issues.apache.org/jira/browse/IOTDB-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272686#comment-17272686
 ] 

Xiangdong Huang commented on IOTDB-1131:
----------------------------------------

see https://github.com/apache/iotdb/pull/1869 for the draft.

> dictionary encoding of deviceID and measurementID in WAL
> --------------------------------------------------------
>
>                 Key: IOTDB-1131
>                 URL: https://issues.apache.org/jira/browse/IOTDB-1131
>             Project: Apache IoTDB
>          Issue Type: Improvement
>          Components: WAL
>            Reporter: Xiangdong Huang
>            Priority: Major
>
> This is an interesting idea that proposed by Tian Jiang.
> Copy from Tian Jiang:
> Write ahead logs (WALs) ensure that data which are not persisted yet can 
> still be recovered from a system failure, thus to increase the durability of 
> a DBMS. However, WALs generally require more frequent flushes to limit the 
> possibility of losing data, which increases disk utilities significantly as 
> each flush requires one disk I/O. Moreover, logs are hardly compressed or 
> encoded like what we are doing to the raw data in TsFiles, and result is that 
> logs containing the same data consume much larger space than the data chunks. 
> The disadvantages are two-folds: first, large logs will compete for more disk 
> bandwidth, slowing down the persistence of raw data; second, even if WALs are 
> placed on another disk, (possibly SSD for high throughput), as WALs are 
> removed frequently once their corresponding data are persisted, such frequent 
> write-and-erases will shorten disk life especially for SSDs.
> So it is beneficial to reduce the sizes of WALs. In IoTDB (and also other 
> DBMSs), the majority of WALs are logs of insertions, as other operations like 
> deletions and updates are often rare compared with insertions. This 
> observation enlightens us that may focus on reducing sizes of insertion logs, 
> which is enough to attain ideal improvement of the whole system. Currently, 
> we serialize complete physical plans into WAL, but we notice that despite 
> values and timestamps generally varies from plan to plan, head information 
> like deviceIds, measurementIds and data types are highly redundant, and 
> sometimes deviceIds and measurementIds are long strings, which may consume a 
> significant amount of space. So in this design, we concentrate on reducing 
> duplicated deviceIds, measurementIds and data types in WALs.
> Method
> To reduce duplicated deviceIds, measurementIds and data types in WALs, we use 
> windowed differentiation technique (or referencing) to replace redundant 
> fields with a index pointing to a base log, if such a log can be found within 
> a given window. Detailed procedure are described below:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to