Tian Jiang created IOTDB-853:
--------------------------------
Summary: Log compaction by omitting the same log fields
Key: IOTDB-853
URL: https://issues.apache.org/jira/browse/IOTDB-853
Project: Apache IoTDB
Issue Type: Improvement
Components: Core/WAL
Reporter: Tian Jiang
[1] mentioned an interesting way of log compaction, which records the page Id
and txn Id of the previous log and omit the one in the next log if they are the
same.
I think it is very possible to apply such a technique to IoTDB's WAL. During
the persistence of logs, we may keep a log window of the previous N logs, and
when we are going to persist one log, we search the log window to find the
nearest log with the same type and see if that log has the same field as the
current one, e.g., it is very possible that neighboring insertions will have
the same deviceIds and measurementIds, so we can directly use a forward
reference to fill the log field (like using "3" meanings this field has the
same value as the log whose index is smaller by 3 than the current one). This
way, a very long path can be simply replaced by a byte (0~255), and disk space
and I/O may be saved greatly.
The idea itself can be implemented easily, but the challenges locate in that
how to define a proper window length and compare logs efficiently so that the
additional computing will not become another bottleneck.
[1] Michael Haubenschild, Caetano Sauer, Thomas Neumann, and Viktor Leis. 2020.
Rethinking Logging, Checkpoints, and Recovery for High-Performance Storage
Engines. In Proceedings of the 2020 ACM SIGMOD International Conference on
Management of Data (SIGMOD '20). Association for Computing Machinery, New York,
NY, USA, 877–892. DOI:https://doi.org/10.1145/3318464.3389716
--
This message was sent by Atlassian Jira
(v8.3.4#803005)