Re: Propose a scheme for Coordinator to pull metadata incrementally

Benedict Jin Tue, 06 Apr 2021 18:52:08 -0700

Hi Samarth Jain,

Thanks. The main reason is the huge amount of metadata, which leads to a very 
slow process of scanning the full table of metadata storage and deserializing 
metadata. Yes, I have tried to clean up the metadata.


Regards,
Benedict Jin

On 2021/04/06 17:20:26, Samarth Jain <samarth.j...@gmail.com> wrote: 
> Hi Benedict,
> 
> I am curious to understand what functionality of Druid are you seeing the
> slowness in? Is it the coordinator work of assigning segments to
> historicals that is slower or is it the querying of segment information
> that is slower? Have you looked into CPU/network metrics for your metadata
> RDS? Maybe scaling up to a bigger instance would help. It would also be
> good to see the query patterns and possibly tweak or add new indexes to
> help speed up. Also, do you have the cleanup of metadata rows enabled (
> https://druid.apache.org/docs/latest/tutorials/tutorial-delete-data.html#run-a-kill-task
> and *druid.coordinator.kill*.*on*)   that should further help control the
> size of druid_segments table.
> 
> On Tue, Apr 6, 2021 at 8:08 AM Ben Krug <ben.k...@imply.io> wrote:
> 
> > I suppose, if we were going down this path, something like tombstones in
> > Cassandra could be used.
> > But it would increase the complexity significantly.
> > Ie, a new row is inserted with a deletion marker and a timestamp, that
> > indicates that the corresponding row is deleted.
> > Now, when anyone does scan the table, they need to check for tombstones too
> > and process that logic.  Then, after
> > a configurable amount of time, both the original row and the tombstone row
> > can be cleaned up.
> >
> > Probably a lot of work and complexity for this one use case, though.
> >
> > On Tue, Apr 6, 2021 at 4:02 AM Abhishek Agarwal <abhishek.agar...@imply.io
> > >
> > wrote:
> >
> > > If an entry is deleted from the metadata, how is the coordinator going to
> > > update its own state?
> > >
> > > On Tue, Apr 6, 2021 at 3:38 PM Itai Yaffe <itai.ya...@gmail.com> wrote:
> > >
> > > > Hey,
> > > > I'm not a Druid developer, so it's quite possible I'm missing many
> > > > considerations here, but from a first glance, I like your offer, as it
> > > > resembles the *tsColumn *in JDBC lookups (
> > > >
> > > >
> > >
> > https://druid.apache.org/docs/latest/development/extensions-core/lookups-cached-global.html#jdbc-lookup
> > > > ).
> > > >
> > > > Anyway, just my 2 cents.
> > > >
> > > > Thanks!
> > > >           Itai
> > > >
> > > > On Tue, Apr 6, 2021 at 6:07 AM Benedict Jin <asdf2...@apache.org>
> > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Recently, when the Coordinator in our company's Druid cluster pulls
> > > > > metadata, there is a performance bottleneck. The main reason is the
> > > huge
> > > > > amount of metadata, which leads to a very slow process of scanning
> > the
> > > > full
> > > > > table of metadata storage and deserializing metadata. The size of the
> > > > full
> > > > > metadata has been reduced through TTL, Compaction, Rollup, and etc.,
> > > but
> > > > > the effect is not very significant. Therefore, I want to design a
> > > scheme
> > > > > for Coordinator to pull metadata incrementally, that is, each time
> > > > > Coordinator only pulls newly added metadata, so as to reduce the
> > query
> > > > > pressure of metadata storage and the pressure of deserializing
> > > metadata.
> > > > > The general idea is to add a column last_update to the druid_segments
> > > > table
> > > > > to record the update time of each record. Furthermore, when we query
> > > the
> > > > > metadata table, we can add filter conditions for the last_update
> > column
> > > > to
> > > > > avoid full table scan operations. Moreover, whether it is MySQL or
> > > > > PostgreSQL as the metadata storage medium, it can support
> > > > >  automatic update of the timestamp field, which is somewhat similar
> > to
> > > > the
> > > > > characteristics of triggers. So, have you encountered this problem
> > > > before?
> > > > > If so, how did you solve it? In addition, do you have any suggestions
> > > or
> > > > > comments on the above incremental acquisition of metadata? Please let
> > > me
> > > > > know, thanks a lot.
> > > > >
> > > > > Regards,
> > > > > Benedict Jin
> > > > >
> > > > > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> > > > > For additional commands, e-mail: dev-h...@druid.apache.org
> > > > >
> > > > >
> > > >
> > >
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org

Re: Propose a scheme for Coordinator to pull metadata incrementally

Reply via email to