Hi~ I have read the mail list carefully and Claude's code is awesome and well structured. We will have a try with your bloom filter. Thanks :D
------------------ ???????? ------------------ ??????: "Claude Warren"<[email protected]>; ????????: 2019??10??15??(??????) ????8:46 ??????: "Xiangdong Huang"<[email protected]>; ????: "dev"<[email protected]>; ????: Re: Add bloom filters to TsFile I would hope that the commons bloom filter would do what you need to do without you haveing to write custom code other than some interfacing between your objects and the hashes. Claude On Tue, Oct 15, 2019 at 1:43 PM Xiangdong Huang <[email protected]> wrote: > Hi Kaifeng, > > > (1) Implementation of bloom filter: using BitSet implemented in java, > the number of bits can be specified by the user > > Why not have a read about the discussions that Claude mentioned, you can > find them in [1]. > > You can evaluate whether it is suitable for TsFile, learn how Claude > implement it, and give your opinion if you want. > > [1] > https://lists.apache.org/[email protected]:lte=1M:bloom%20filter > > Best, > ----------------------------------- > Xiangdong Huang > School of Software, Tsinghua University > > ?????? > ???????? ???????? > > > ?????????????????????? <[email protected]> ??2019??10??15?????? ????7:10?????? > >> Hi~ >> Here are my ideas: >> >> >> Problem: When the server runs for a period of time, there will be many >> fragmented tsfiles. The storage groups in these tsfiles are different. When >> a query comes, we must load the metadata of each tsfile to determine >> whether the specified time series exists in the tsfile. >> >> >> Solution: create a bloom filter for the time series contained in each >> tsfile. If the specified time series does not exist, we do not need to read >> metadata. >> >> >> Design: >> (1) Implementation of bloom filter: using BitSet implemented in java, the >> number of bits can be specified by the user >> (2) storage location and size: it is stored in the last part of metadata. >> The specific data is as follows: BitSet size (int), BitSet >> >> >> Interface: >> Add the getBloomFilter interface to TsfileMetadata to get the bloom >> filter, and use its "contains" method to determine whether a path exists in >> the tsfile >> >> >> >> If you have some ideas, please feel free to discuss with me. :D >> >> >> >> >> ------------------ ???????? ------------------ >> ??????: "Claude Warren"<[email protected]>; >> ????????: 2019??10??15??(??????) ????5:52 >> ??????: "Xiangdong Huang"<[email protected]>;"Julian Feinauer"< >> [email protected]>; >> ????: "dev"<[email protected]>; >> ????: Re: Add bloom filters to TsFile >> >> >> >> Greetings, >> >> This is a discussion on the [email protected] mailing list >> concerning >> the addition of bloom filters to commons. Please take a look and comment >> there. >> >> Thx, >> Claude >> >> On Thu, Sep 12, 2019 at 3:44 PM Xiangdong Huang <[email protected]> >> wrote: >> >> > +1 for bloom filter! >> > +1 for implementation (but seems no license file in the repo...) >> > >> > By the way, it seems that there are some new variants of bloom filter, >> > e.g., supporting range query. >> > I am not sure whether do we need the variants, e.g., for supporting >> check >> > whether a timeseries set "root.a.b.*.speed" exist. >> > >> > Best, >> > ----------------------------------- >> > Xiangdong Huang >> > School of Software, Tsinghua University >> > >> > ?????? >> > ???????? ???????? >> > >> > >> > Claude Warren <[email protected]> ??2019??9??12?????? ????12:29?????? >> > >> >> In my reading of the short message it seems like it would make sense to >> >> use >> >> a bloom filter to determine if the "gear" is in the file. I have the >> >> library that I am proposing to move to commons. It can be found at >> >> >> >> >> https://github.com/Claudenw/BloomFilter/tree/MultiFilter/src/main/java/org/xenei/bloomfilter >> >> >> >> Claude >> >> >> >> On Tue, Sep 10, 2019 at 3:45 PM Julian Feinauer < >> >> [email protected]> wrote: >> >> >> >> > Hi, >> >> > >> >> > I like the idea. I'm just adding Claude here as we talked yesterday >> >> about >> >> > a bloom filter implementation he has already done. >> >> > >> >> > @[email protected] <[email protected]> what do you think? : ) >> >> > >> >> > Julian >> >> > ------------------------------ >> >> > *From:* Tian Jiang <[email protected]> >> >> > *Sent:* Tuesday, September 10, 2019 5:14:33 AM >> >> > *To:* [email protected] <[email protected]> >> >> > *Subject:* Add bloom filters to TsFile >> >> > >> >> > >> >> > >> >> > Greetings, >> >> > >> >> > >> >> > The recent readings remind me that the bloom filter is standard >> >> equipment >> >> > in K-VDBs. Although IoTDB is not one of them (at least not >> typically), >> >> the >> >> > bloom filter still helps a lot in various situations. For example, >> our >> >> > recent experiments gave us an illusion that the time series in a >> storage >> >> > group remains unchanged. However, that is not the case. >> >> > >> >> > >> >> > Naturally, in real situations, the number of time series grows over >> >> time, >> >> > due to reasons like adding new gears. The old files do not contain >> such >> >> a >> >> > time series. Without the help of bloom filters, we have to check each >> >> old >> >> > file only to find that there is no such time series. To my knowledge, >> >> this >> >> > may take a lot of time. >> >> > >> >> > >> >> > So, I suggest we add a bloom filter (or some more efficient one) to >> each >> >> > TsFile to help skip unwanted files. >> >> > >> >> > >> >> > | | >> >> > Tian Jiang >> >> > | >> >> > | >> >> > [email protected] >> >> > | >> >> > ?????????????????????? >> >> > >> >> >> > > >
