Hi Kaifeng, > (1) Implementation of bloom filter: using BitSet implemented in java, the number of bits can be specified by the user
Why not have a read about the discussions that Claude mentioned, you can find them in [1]. You can evaluate whether it is suitable for TsFile, learn how Claude implement it, and give your opinion if you want. [1] https://lists.apache.org/[email protected]:lte=1M:bloom%20filter Best, ----------------------------------- Xiangdong Huang School of Software, Tsinghua University 黄向东 清华大学 软件学院 南京大学软件学院薛恺丰 <[email protected]> 于2019年10月15日周二 下午7:10写道: > Hi~ > Here are my ideas: > > > Problem: When the server runs for a period of time, there will be many > fragmented tsfiles. The storage groups in these tsfiles are different. When > a query comes, we must load the metadata of each tsfile to determine > whether the specified time series exists in the tsfile. > > > Solution: create a bloom filter for the time series contained in each > tsfile. If the specified time series does not exist, we do not need to read > metadata. > > > Design: > (1) Implementation of bloom filter: using BitSet implemented in java, the > number of bits can be specified by the user > (2) storage location and size: it is stored in the last part of metadata. > The specific data is as follows: BitSet size (int), BitSet > > > Interface: > Add the getBloomFilter interface to TsfileMetadata to get the bloom > filter, and use its "contains" method to determine whether a path exists in > the tsfile > > > > If you have some ideas, please feel free to discuss with me. :D > > > > > ------------------ 原始邮件 ------------------ > 发件人: "Claude Warren"<[email protected]>; > 发送时间: 2019年10月15日(星期二) 下午5:52 > 收件人: "Xiangdong Huang"<[email protected]>;"Julian Feinauer"< > [email protected]>; > 抄送: "dev"<[email protected]>; > 主题: Re: Add bloom filters to TsFile > > > > Greetings, > > This is a discussion on the [email protected] mailing list concerning > the addition of bloom filters to commons. Please take a look and comment > there. > > Thx, > Claude > > On Thu, Sep 12, 2019 at 3:44 PM Xiangdong Huang <[email protected]> > wrote: > > > +1 for bloom filter! > > +1 for implementation (but seems no license file in the repo...) > > > > By the way, it seems that there are some new variants of bloom filter, > > e.g., supporting range query. > > I am not sure whether do we need the variants, e.g., for supporting check > > whether a timeseries set "root.a.b.*.speed" exist. > > > > Best, > > ----------------------------------- > > Xiangdong Huang > > School of Software, Tsinghua University > > > > 黄向东 > > 清华大学 软件学院 > > > > > > Claude Warren <[email protected]> 于2019年9月12日周四 上午12:29写道: > > > >> In my reading of the short message it seems like it would make sense to > >> use > >> a bloom filter to determine if the "gear" is in the file. I have the > >> library that I am proposing to move to commons. It can be found at > >> > >> > https://github.com/Claudenw/BloomFilter/tree/MultiFilter/src/main/java/org/xenei/bloomfilter > >> > >> Claude > >> > >> On Tue, Sep 10, 2019 at 3:45 PM Julian Feinauer < > >> [email protected]> wrote: > >> > >> > Hi, > >> > > >> > I like the idea. I'm just adding Claude here as we talked yesterday > >> about > >> > a bloom filter implementation he has already done. > >> > > >> > @[email protected] <[email protected]> what do you think? : ) > >> > > >> > Julian > >> > ------------------------------ > >> > *From:* Tian Jiang <[email protected]> > >> > *Sent:* Tuesday, September 10, 2019 5:14:33 AM > >> > *To:* [email protected] <[email protected]> > >> > *Subject:* Add bloom filters to TsFile > >> > > >> > > >> > > >> > Greetings, > >> > > >> > > >> > The recent readings remind me that the bloom filter is standard > >> equipment > >> > in K-VDBs. Although IoTDB is not one of them (at least not typically), > >> the > >> > bloom filter still helps a lot in various situations. For example, our > >> > recent experiments gave us an illusion that the time series in a > storage > >> > group remains unchanged. However, that is not the case. > >> > > >> > > >> > Naturally, in real situations, the number of time series grows over > >> time, > >> > due to reasons like adding new gears. The old files do not contain > such > >> a > >> > time series. Without the help of bloom filters, we have to check each > >> old > >> > file only to find that there is no such time series. To my knowledge, > >> this > >> > may take a lot of time. > >> > > >> > > >> > So, I suggest we add a bloom filter (or some more efficient one) to > each > >> > TsFile to help skip unwanted files. > >> > > >> > > >> > | | > >> > Tian Jiang > >> > | > >> > | > >> > [email protected] > >> > | > >> > 签名由网易邮箱大师定制 > >> > > >> > >
