[
https://issues.apache.org/jira/browse/IOTDB-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099724#comment-17099724
]
Rui Liu commented on IOTDB-637:
-------------------------------
1. The reason why in original design of ActiveTimeSeriesCounter we update the
active time series ratio when a memtable is flushed is to reduce the overhead
and improve writing performance (comparing to more precise approaches like
update when an insert plan is executed). And hence the backward is that this
approach has a delay(time shift) for reflecting the active time series number.
We assumed that in most applications the writing workload is stable and in such
scenarios this approach works fine.
However we newly added the time range partition feature and the design of
ActiveTimeSeriesCounter should be adapted to that accordingly. Seems that this
problem was not considered when designing the time partition feature back then.
Anyway, it is an art of reducing the cost of the counting as little as possible
and meanwhile keeping the counting function works effectively.
2. I agree with the lock improvement.
> Improper lock level and (maybe) error in ActiveTimeSeriesCounter
> -----------------------------------------------------------------
>
> Key: IOTDB-637
> URL: https://issues.apache.org/jira/browse/IOTDB-637
> Project: Apache IoTDB
> Issue Type: Bug
> Components: Core/Engine
> Affects Versions: 0.10.0-SNAPSHOT
> Reporter: Xiangdong Huang
> Priority: Major
>
> When a SG is flushed, the updateActiveRatio() of
> ActiveTimeSeriesCounter.class will be called.
> The function executes the following 2 steps:
> # calculate the new active time series ratio in recently flushed memtable;
> # reset the info for calculating the active ratio in next memtable.
> The question is, we may have 2 (or more) memtables for each SG. While one
> enqueues the flushing queue, the other may have already received data. If so,
> after the first memtable is flushed, the info in ActiveTimeSeriesCounter will
> be reset and therefore the info about the second memtable is lost.
>
> The above case is more each to appear when the time range partition feature
> is enabled. In this case, a SG may have several subSGs (while each one
> represents a time range of the SG).
>
> Lock Level:
> # As all maps are concurrency maps, there is not need to use lock in
> getActiveRatio();
> # `storageGroupHllMap.get(storageGroup).cardinality();` is not O(1)
> operation, we can reduce the cover scope of the lock in updateActiveRatio(),
> maybe we can require the lock only when `activeTimeSeriesNum !=
> activeTimeSeriesNumMap.get(storageGroup)` == true.
> How do you think? [~liurui]
>
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)