Hi guys,
I believe using tags to do data filtering and aggregation can be a common need.
Putting all the attributes into the path is not a good idea because it makes
the path extremely long, and slows down the MTree searching, so we take some of
the attributes as tags. But that doesn't mean tags are not important.
Let's take the following ECS management scenario as an example. IoTDB stores
the cpu_util of each ECS instance. Besides that, an ECS instance has static
attributes like region_id, available_zone, hostname, CPU, memory, storage, and
OS store. Since the CPU, memory, and storage are numbers and OS is a string
with white spaces, they are stored as tags and other attributes are stored as
levels in the path like
root.${region_id}.${available_zone}.${hostname}.cpu_util.
Let's say there are some ECS instances whose cpu_util is abnormally high in the
last hour and we want to know if the problem is caused by a certain version of
OS. The query should be like,
> SELECT OS, COUNT(cpu_util) FROM root.** WHERE cpu_util > 95.0 GRUOP BY
TAG OS ALIGN BY DEVICE
​With the ability to do filter and aggregation with tags, IoTDB can be more
powerful in analytics processing. What do you think?
Any suggestions are welcome :D
Zhong Wang,
Alibaba group