Hi~
Here are my ideas:

Problem: When the server runs for a period of time, there will be many 
fragmented tsfiles. The storage groups in these tsfiles are different. When a 
query comes, we must load the metadata of each tsfile to determine whether the 
specified time series exists in the tsfile.


Solution: create a bloom filter for the time series contained in each tsfile. 
If the specified time series does not exist, we do not need to read metadata.


Design:
(1) Implementation of bloom filter: using BitSet implemented in java, the 
number of bits can be specified by the user
(2) storage location and size: it is stored in the last part of metadata. The 
specific data is as follows: BitSet size (int), BitSet


Interface:
Add the getBloomFilter interface to TsfileMetadata to get the bloom filter, and 
use its "contains" method to determine whether a path exists in the tsfile



If you have some ideas, please feel free to discuss with me. :D




------------------ ???????? ------------------
??????: "Claude Warren"<[email protected]>;
????????: 2019??10??15??(??????) ????5:52
??????: "Xiangdong Huang"<[email protected]>;"Julian 
Feinauer"<[email protected]>;
????: "dev"<[email protected]>;
????: Re: Add bloom filters to TsFile



Greetings,

This is a discussion on the [email protected] mailing list concerning
the addition of bloom filters to commons.  Please take a look and comment
there.

Thx,
Claude

On Thu, Sep 12, 2019 at 3:44 PM Xiangdong Huang <[email protected]> wrote:

> +1 for bloom filter!
> +1 for implementation (but seems no  license file in the repo...)
>
> By the way, it seems that there are some new variants of bloom filter,
> e.g., supporting range query.
> I am not sure whether do we need the variants, e.g., for supporting check
> whether a timeseries set "root.a.b.*.speed" exist.
>
> Best,
> -----------------------------------
> Xiangdong Huang
> School of Software, Tsinghua University
>
>  ??????
> ???????? ????????
>
>
> Claude Warren <[email protected]> ??2019??9??12?????? ????12:29??????
>
>> In my reading of the short message it seems like it would make sense to
>> use
>> a bloom filter to determine if the "gear" is in the file.   I have the
>> library that I am proposing to move to commons.  It can be found at
>>
>> https://github.com/Claudenw/BloomFilter/tree/MultiFilter/src/main/java/org/xenei/bloomfilter
>>
>> Claude
>>
>> On Tue, Sep 10, 2019 at 3:45 PM Julian Feinauer <
>> [email protected]> wrote:
>>
>> > Hi,
>> >
>> > I like the idea. I'm just adding Claude here as we talked yesterday
>> about
>> > a bloom filter implementation he has already done.
>> >
>> > @[email protected] <[email protected]> what do you think? : )
>> >
>> > Julian
>> > ------------------------------
>> > *From:* Tian Jiang <[email protected]>
>> > *Sent:* Tuesday, September 10, 2019 5:14:33 AM
>> > *To:* [email protected] <[email protected]>
>> > *Subject:* Add bloom filters to TsFile
>> >
>> >
>> >
>> > Greetings,
>> >
>> >
>> > The recent readings remind me that the bloom filter is standard
>> equipment
>> > in K-VDBs. Although IoTDB is not one of them (at least not typically),
>> the
>> > bloom filter still helps a lot in various situations. For example, our
>> > recent experiments gave us an illusion that the time series in a storage
>> > group remains unchanged. However, that is not the case.
>> >
>> >
>> > Naturally, in real situations, the number of time series grows over
>> time,
>> > due to reasons like adding new gears. The old files do not contain such
>> a
>> > time series. Without the help of bloom filters, we have to check each
>> old
>> > file only to find that there is no such time series. To my knowledge,
>> this
>> > may take a lot of time.
>> >
>> >
>> > So, I suggest we add a bloom filter (or some more efficient one) to each
>> > TsFile to help skip unwanted files.
>> >
>> >
>> > | |
>> > Tian Jiang
>> > |
>> > |
>> > [email protected]
>> > |
>> > ??????????????????????
>> >
>>
>

Reply via email to