Hi Samarth Jain, Thanks. The main reason is the huge amount of metadata, which leads to a very slow process of scanning the full table of metadata storage and deserializing metadata. Yes, I have tried to clean up the metadata.
Regards, Benedict Jin On 2021/04/06 17:20:26, Samarth Jain <samarth.j...@gmail.com> wrote: > Hi Benedict, > > I am curious to understand what functionality of Druid are you seeing the > slowness in? Is it the coordinator work of assigning segments to > historicals that is slower or is it the querying of segment information > that is slower? Have you looked into CPU/network metrics for your metadata > RDS? Maybe scaling up to a bigger instance would help. It would also be > good to see the query patterns and possibly tweak or add new indexes to > help speed up. Also, do you have the cleanup of metadata rows enabled ( > https://druid.apache.org/docs/latest/tutorials/tutorial-delete-data.html#run-a-kill-task > and *druid.coordinator.kill*.*on*) that should further help control the > size of druid_segments table. > > On Tue, Apr 6, 2021 at 8:08 AM Ben Krug <ben.k...@imply.io> wrote: > > > I suppose, if we were going down this path, something like tombstones in > > Cassandra could be used. > > But it would increase the complexity significantly. > > Ie, a new row is inserted with a deletion marker and a timestamp, that > > indicates that the corresponding row is deleted. > > Now, when anyone does scan the table, they need to check for tombstones too > > and process that logic. Then, after > > a configurable amount of time, both the original row and the tombstone row > > can be cleaned up. > > > > Probably a lot of work and complexity for this one use case, though. > > > > On Tue, Apr 6, 2021 at 4:02 AM Abhishek Agarwal <abhishek.agar...@imply.io > > > > > wrote: > > > > > If an entry is deleted from the metadata, how is the coordinator going to > > > update its own state? > > > > > > On Tue, Apr 6, 2021 at 3:38 PM Itai Yaffe <itai.ya...@gmail.com> wrote: > > > > > > > Hey, > > > > I'm not a Druid developer, so it's quite possible I'm missing many > > > > considerations here, but from a first glance, I like your offer, as it > > > > resembles the *tsColumn *in JDBC lookups ( > > > > > > > > > > > > > https://druid.apache.org/docs/latest/development/extensions-core/lookups-cached-global.html#jdbc-lookup > > > > ). > > > > > > > > Anyway, just my 2 cents. > > > > > > > > Thanks! > > > > Itai > > > > > > > > On Tue, Apr 6, 2021 at 6:07 AM Benedict Jin <asdf2...@apache.org> > > wrote: > > > > > > > > > Hi all, > > > > > > > > > > Recently, when the Coordinator in our company's Druid cluster pulls > > > > > metadata, there is a performance bottleneck. The main reason is the > > > huge > > > > > amount of metadata, which leads to a very slow process of scanning > > the > > > > full > > > > > table of metadata storage and deserializing metadata. The size of the > > > > full > > > > > metadata has been reduced through TTL, Compaction, Rollup, and etc., > > > but > > > > > the effect is not very significant. Therefore, I want to design a > > > scheme > > > > > for Coordinator to pull metadata incrementally, that is, each time > > > > > Coordinator only pulls newly added metadata, so as to reduce the > > query > > > > > pressure of metadata storage and the pressure of deserializing > > > metadata. > > > > > The general idea is to add a column last_update to the druid_segments > > > > table > > > > > to record the update time of each record. Furthermore, when we query > > > the > > > > > metadata table, we can add filter conditions for the last_update > > column > > > > to > > > > > avoid full table scan operations. Moreover, whether it is MySQL or > > > > > PostgreSQL as the metadata storage medium, it can support > > > > > automatic update of the timestamp field, which is somewhat similar > > to > > > > the > > > > > characteristics of triggers. So, have you encountered this problem > > > > before? > > > > > If so, how did you solve it? In addition, do you have any suggestions > > > or > > > > > comments on the above incremental acquisition of metadata? Please let > > > me > > > > > know, thanks a lot. > > > > > > > > > > Regards, > > > > > Benedict Jin > > > > > > > > > > --------------------------------------------------------------------- > > > > > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org > > > > > For additional commands, e-mail: dev-h...@druid.apache.org > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org For additional commands, e-mail: dev-h...@druid.apache.org