Hi~ Here are my ideas:
Problem: When the server runs for a period of time, there will be many fragmented tsfiles. The storage groups in these tsfiles are different. When a query comes, we must load the metadata of each tsfile to determine whether the specified time series exists in the tsfile. Solution: create a bloom filter for the time series contained in each tsfile. If the specified time series does not exist, we do not need to read metadata. Design: (1) Implementation of bloom filter: using BitSet implemented in java, the number of bits can be specified by the user (2) storage location and size: it is stored in the last part of metadata. The specific data is as follows: BitSet size (int), BitSet Interface: Add the getBloomFilter interface to TsfileMetadata to get the bloom filter, and use its "contains" method to determine whether a path exists in the tsfile If you have some ideas, please feel free to discuss with me. :D ------------------ ???????? ------------------ ??????: "Claude Warren"<[email protected]>; ????????: 2019??10??15??(??????) ????5:52 ??????: "Xiangdong Huang"<[email protected]>;"Julian Feinauer"<[email protected]>; ????: "dev"<[email protected]>; ????: Re: Add bloom filters to TsFile Greetings, This is a discussion on the [email protected] mailing list concerning the addition of bloom filters to commons. Please take a look and comment there. Thx, Claude On Thu, Sep 12, 2019 at 3:44 PM Xiangdong Huang <[email protected]> wrote: > +1 for bloom filter! > +1 for implementation (but seems no license file in the repo...) > > By the way, it seems that there are some new variants of bloom filter, > e.g., supporting range query. > I am not sure whether do we need the variants, e.g., for supporting check > whether a timeseries set "root.a.b.*.speed" exist. > > Best, > ----------------------------------- > Xiangdong Huang > School of Software, Tsinghua University > > ?????? > ???????? ???????? > > > Claude Warren <[email protected]> ??2019??9??12?????? ????12:29?????? > >> In my reading of the short message it seems like it would make sense to >> use >> a bloom filter to determine if the "gear" is in the file. I have the >> library that I am proposing to move to commons. It can be found at >> >> https://github.com/Claudenw/BloomFilter/tree/MultiFilter/src/main/java/org/xenei/bloomfilter >> >> Claude >> >> On Tue, Sep 10, 2019 at 3:45 PM Julian Feinauer < >> [email protected]> wrote: >> >> > Hi, >> > >> > I like the idea. I'm just adding Claude here as we talked yesterday >> about >> > a bloom filter implementation he has already done. >> > >> > @[email protected] <[email protected]> what do you think? : ) >> > >> > Julian >> > ------------------------------ >> > *From:* Tian Jiang <[email protected]> >> > *Sent:* Tuesday, September 10, 2019 5:14:33 AM >> > *To:* [email protected] <[email protected]> >> > *Subject:* Add bloom filters to TsFile >> > >> > >> > >> > Greetings, >> > >> > >> > The recent readings remind me that the bloom filter is standard >> equipment >> > in K-VDBs. Although IoTDB is not one of them (at least not typically), >> the >> > bloom filter still helps a lot in various situations. For example, our >> > recent experiments gave us an illusion that the time series in a storage >> > group remains unchanged. However, that is not the case. >> > >> > >> > Naturally, in real situations, the number of time series grows over >> time, >> > due to reasons like adding new gears. The old files do not contain such >> a >> > time series. Without the help of bloom filters, we have to check each >> old >> > file only to find that there is no such time series. To my knowledge, >> this >> > may take a lot of time. >> > >> > >> > So, I suggest we add a bloom filter (or some more efficient one) to each >> > TsFile to help skip unwanted files. >> > >> > >> > | | >> > Tian Jiang >> > | >> > | >> > [email protected] >> > | >> > ?????????????????????? >> > >> >
