Hi Kaifeng,

> (1) Implementation of bloom filter: using BitSet implemented in java, the
number of bits can be specified by the user

Why not have a read about the discussions that Claude mentioned, you can
find them in [1].

You can evaluate whether it is suitable for TsFile, learn how Claude
implement it, and give your opinion if you want.

[1]
https://lists.apache.org/[email protected]:lte=1M:bloom%20filter

Best,
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


南京大学软件学院薛恺丰 <[email protected]> 于2019年10月15日周二 下午7:10写道:

> Hi~
> Here are my ideas:
>
>
> Problem: When the server runs for a period of time, there will be many
> fragmented tsfiles. The storage groups in these tsfiles are different. When
> a query comes, we must load the metadata of each tsfile to determine
> whether the specified time series exists in the tsfile.
>
>
> Solution: create a bloom filter for the time series contained in each
> tsfile. If the specified time series does not exist, we do not need to read
> metadata.
>
>
> Design:
> (1) Implementation of bloom filter: using BitSet implemented in java, the
> number of bits can be specified by the user
> (2) storage location and size: it is stored in the last part of metadata.
> The specific data is as follows: BitSet size (int), BitSet
>
>
> Interface:
> Add the getBloomFilter interface to TsfileMetadata to get the bloom
> filter, and use its "contains" method to determine whether a path exists in
> the tsfile
>
>
>
> If you have some ideas, please feel free to discuss with me. :D
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "Claude Warren"<[email protected]>;
> 发送时间: 2019年10月15日(星期二) 下午5:52
> 收件人: "Xiangdong Huang"<[email protected]>;"Julian Feinauer"<
> [email protected]>;
> 抄送: "dev"<[email protected]>;
> 主题: Re: Add bloom filters to TsFile
>
>
>
> Greetings,
>
> This is a discussion on the [email protected] mailing list concerning
> the addition of bloom filters to commons.  Please take a look and comment
> there.
>
> Thx,
> Claude
>
> On Thu, Sep 12, 2019 at 3:44 PM Xiangdong Huang <[email protected]>
> wrote:
>
> > +1 for bloom filter!
> > +1 for implementation (but seems no  license file in the repo...)
> >
> > By the way, it seems that there are some new variants of bloom filter,
> > e.g., supporting range query.
> > I am not sure whether do we need the variants, e.g., for supporting check
> > whether a timeseries set "root.a.b.*.speed" exist.
> >
> > Best,
> > -----------------------------------
> > Xiangdong Huang
> > School of Software, Tsinghua University
> >
> >  黄向东
> > 清华大学 软件学院
> >
> >
> > Claude Warren <[email protected]> 于2019年9月12日周四 上午12:29写道:
> >
> >> In my reading of the short message it seems like it would make sense to
> >> use
> >> a bloom filter to determine if the "gear" is in the file.   I have the
> >> library that I am proposing to move to commons.  It can be found at
> >>
> >>
> https://github.com/Claudenw/BloomFilter/tree/MultiFilter/src/main/java/org/xenei/bloomfilter
> >>
> >> Claude
> >>
> >> On Tue, Sep 10, 2019 at 3:45 PM Julian Feinauer <
> >> [email protected]> wrote:
> >>
> >> > Hi,
> >> >
> >> > I like the idea. I'm just adding Claude here as we talked yesterday
> >> about
> >> > a bloom filter implementation he has already done.
> >> >
> >> > @[email protected] <[email protected]> what do you think? : )
> >> >
> >> > Julian
> >> > ------------------------------
> >> > *From:* Tian Jiang <[email protected]>
> >> > *Sent:* Tuesday, September 10, 2019 5:14:33 AM
> >> > *To:* [email protected] <[email protected]>
> >> > *Subject:* Add bloom filters to TsFile
> >> >
> >> >
> >> >
> >> > Greetings,
> >> >
> >> >
> >> > The recent readings remind me that the bloom filter is standard
> >> equipment
> >> > in K-VDBs. Although IoTDB is not one of them (at least not typically),
> >> the
> >> > bloom filter still helps a lot in various situations. For example, our
> >> > recent experiments gave us an illusion that the time series in a
> storage
> >> > group remains unchanged. However, that is not the case.
> >> >
> >> >
> >> > Naturally, in real situations, the number of time series grows over
> >> time,
> >> > due to reasons like adding new gears. The old files do not contain
> such
> >> a
> >> > time series. Without the help of bloom filters, we have to check each
> >> old
> >> > file only to find that there is no such time series. To my knowledge,
> >> this
> >> > may take a lot of time.
> >> >
> >> >
> >> > So, I suggest we add a bloom filter (or some more efficient one) to
> each
> >> > TsFile to help skip unwanted files.
> >> >
> >> >
> >> > | |
> >> > Tian Jiang
> >> > |
> >> > |
> >> > [email protected]
> >> > |
> >> > 签名由网易邮箱大师定制
> >> >
> >>
> >

Reply via email to