[jira] [Created] (IOTDB-1889) Adaptive TimeArray to reduce timestamp memory footprint

Tian Jiang (Jira) Mon, 25 Oct 2021 01:48:18 -0700

Tian Jiang created IOTDB-1889:
---------------------------------

             Summary: Adaptive TimeArray to reduce timestamp memory footprint
                 Key: IOTDB-1889
                 URL: https://issues.apache.org/jira/browse/IOTDB-1889
             Project: Apache IoTDB
          Issue Type: Improvement
          Components: Core/Engine
            Reporter: Tian Jiang



We use long arrays (64-bit integers) to store timestamps of timeseries points 
in memory. Although longs may fit almost all situations, they may be 
overfitting for general situations as consecutive timestamps usually have a 
small difference, which is around the sampling interval (a few microseconds to 
a few seconds). So it is straightforward that by only recording the differences 
of adjacent timestamps, we can reduce the memory footprint of MemTables 
greatly. For example, for double-typed timeseries, each time-value pair takes 
8+8=16 bytes, and if we can switch the datatype of timestamps to short (int16), 
6 bytes will be saved for each point, and it will result in a memory saving of 
37.5%.

The idea is intuitive, we record the full timestamp of the first point, and 
only record the differences of adjacent points in an integer array. Just like 
the current implementation, the timestamps are batched with fixed-length 
arrays, so for each such array, a full timestamp is maintained. Int64 is used 
for the first array, and during the construction of an array, we keep track of 
the largest difference, and if it can be recorded by a smaller integer, then we 
use a smaller integer for the next array. For example, assuming the length of 
each array is 4 and we have already written 5 timestamps: t0, t0+800, t0+1530, 
t0+2430, then we will record t0 outside the array, and 0, 800, 730, 900 in the 
array (we record a zero in the head to unify the access of timestamps, i.e., do 
not use if-else for the first timestamp). Next, we find that the maximum 
difference is 900, which can be represented by an int16, so we decide to use 
int16 for the next array. On the other side, when overflow occurs, we use a 
larger integer array and copy all values to the new array.

However, this optimization introduces two side-effects: first, as the full 
timestamp is not stored anymore, we must traverse from the first position of 
the array to calculate the true timestamps. Considering that such in-memory 
data points do not support methods of visiting other than traverse, this 
side-effect is not a big deal.
The second one is more threatening, we cannot sort the timestamps as previously 
for the same reason. The simplest way is to switch to the previous method 
(using int64 to store full timestamps) as soon as we detect any disorders. The 
second method is more complicated: since sorting algorithms basically consist 
of comparisons and swaps, to support comparison, we can store the differences 
between the first timestamps and other timestamps instead of the differences 
between two consecutive timestamps, so we can restore each timestamp in O(1) 
time and comparison is also affordable. As for swaps, if the first position is 
swapped, all other positions should be updated, otherwise, only the swapped 
position needs updating.

We call the method "adaptive" for two reasons: first, the type of integers may 
change as the differences of timestamps become larger or smaller; second, since 
storing differences adding the cost of sorting when there are disordered data, 
it should be disabled when the increased sorting cost overwhelms the benefit it 
brings.

Despite all, the method has great potential in situations where data comes in 
at a high frequency (so timestamp differences are small) and without any 
disorders, and it is totally compatible with the previous method.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (IOTDB-1889) Adaptive TimeArray to reduce timestamp memory footprint

Reply via email to