Tian Jiang created IOTDB-853:
--------------------------------

             Summary: Log compaction by omitting the same log fields
                 Key: IOTDB-853
                 URL: https://issues.apache.org/jira/browse/IOTDB-853
             Project: Apache IoTDB
          Issue Type: Improvement
          Components: Core/WAL
            Reporter: Tian Jiang


[1] mentioned an interesting way of log compaction, which records the page Id 
and txn Id of the previous log and omit the one in the next log if they are the 
same. 

I think it is very possible to apply such a technique to IoTDB's WAL. During 
the persistence of logs, we may keep a log window of the previous N logs, and 
when we are going to persist one log, we search the log window to find the 
nearest log with the same type and see if that log has the same field as the 
current one, e.g., it is very possible that neighboring insertions will have 
the same deviceIds and measurementIds, so we can directly use a forward 
reference to fill the log field (like using "3" meanings this field has the 
same value as the log whose index is smaller by 3 than the current one). This 
way, a very long path can be simply replaced by a byte (0~255), and disk space 
and I/O may be saved greatly.

The idea itself can be implemented easily, but the challenges locate in that 
how to define a proper window length and compare logs efficiently so that the 
additional computing will not become another bottleneck. 

[1] Michael Haubenschild, Caetano Sauer, Thomas Neumann, and Viktor Leis. 2020. 
Rethinking Logging, Checkpoints, and Recovery for High-Performance Storage 
Engines. In Proceedings of the 2020 ACM SIGMOD International Conference on 
Management of Data (SIGMOD '20). Association for Computing Machinery, New York, 
NY, USA, 877–892. DOI:https://doi.org/10.1145/3318464.3389716



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to