Re: Propose a scheme for Coordinator to pull metadata incrementally

Benedict Jin Tue, 06 Apr 2021 18:54:28 -0700

Hi Ben Krug,

+1 for adding the is_deleted column, and then we can create a timing trigger to 
clear these old records.


Regards,
Benedict Jin

On 2021/04/06 18:28:45, Ben Krug <ben.k...@imply.io> wrote: 
> Oh, that's easier than tombstones.  flag is_deleted and update timestamp
> (so it gets pulled again).
> 
> On Tue, Apr 6, 2021 at 10:48 AM Tijo Thomas <tijothoma...@gmail.com> wrote:
> 
> > Abhishek,
> > Good point.  Do we need one more col for storing if it's deleted or not?
> >
> > On Tue, Apr 6, 2021 at 4:32 PM Abhishek Agarwal <abhishek.agar...@imply.io
> > >
> > wrote:
> >
> > > If an entry is deleted from the metadata, how is the coordinator going to
> > > update its own state?
> > >
> > > On Tue, Apr 6, 2021 at 3:38 PM Itai Yaffe <itai.ya...@gmail.com> wrote:
> > >
> > > > Hey,
> > > > I'm not a Druid developer, so it's quite possible I'm missing many
> > > > considerations here, but from a first glance, I like your offer, as it
> > > > resembles the *tsColumn *in JDBC lookups (
> > > >
> > > >
> > >
> > https://druid.apache.org/docs/latest/development/extensions-core/lookups-cached-global.html#jdbc-lookup
> > > > ).
> > > >
> > > > Anyway, just my 2 cents.
> > > >
> > > > Thanks!
> > > >           Itai
> > > >
> > > > On Tue, Apr 6, 2021 at 6:07 AM Benedict Jin <asdf2...@apache.org>
> > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Recently, when the Coordinator in our company's Druid cluster pulls
> > > > > metadata, there is a performance bottleneck. The main reason is the
> > > huge
> > > > > amount of metadata, which leads to a very slow process of scanning
> > the
> > > > full
> > > > > table of metadata storage and deserializing metadata. The size of the
> > > > full
> > > > > metadata has been reduced through TTL, Compaction, Rollup, and etc.,
> > > but
> > > > > the effect is not very significant. Therefore, I want to design a
> > > scheme
> > > > > for Coordinator to pull metadata incrementally, that is, each time
> > > > > Coordinator only pulls newly added metadata, so as to reduce the
> > query
> > > > > pressure of metadata storage and the pressure of deserializing
> > > metadata.
> > > > > The general idea is to add a column last_update to the druid_segments
> > > > table
> > > > > to record the update time of each record. Furthermore, when we query
> > > the
> > > > > metadata table, we can add filter conditions for the last_update
> > column
> > > > to
> > > > > avoid full table scan operations. Moreover, whether it is MySQL or
> > > > > PostgreSQL as the metadata storage medium, it can support
> > > > >  automatic update of the timestamp field, which is somewhat similar
> > to
> > > > the
> > > > > characteristics of triggers. So, have you encountered this problem
> > > > before?
> > > > > If so, how did you solve it? In addition, do you have any suggestions
> > > or
> > > > > comments on the above incremental acquisition of metadata? Please let
> > > me
> > > > > know, thanks a lot.
> > > > >
> > > > > Regards,
> > > > > Benedict Jin
> > > > >
> > > > > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> > > > > For additional commands, e-mail: dev-h...@druid.apache.org
> > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > Thanks & Regards
> > Tijo Thomas
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org

Re: Propose a scheme for Coordinator to pull metadata incrementally

Reply via email to