Apologies, I must be missing something very basic in how incremental indexing is working.A sketch is by itself an aggregator - it can absorb millions of updates before it exceeds its space limit or is flushed to disk. I assumed the ingestion thread aggregates data in multiple sketches in parallel, then at query time a union operation is invoked to merge relevant sketches based on the attributes of the query, and when the union is completed its result is returned to the user. But in such scenario there is no need to call get before the union is completed. This means there is another scenario where union is used and can be queried while in the process of executing the merge. Is this to maintain some in-memory hierarchy of aggregations? or for creating the snapshots that are flushed to disk? A better understanding of the use case will help in presenting a better thread-safe solution. Thanks,Eshcar
On Wednesday, July 11, 2018, 7:51:24 PM GMT+3, Gian Merlino <g...@apache.org> wrote: Hi Eshcar, > But even in a single-writer-single-reader scenario removing the lock can increase the throughput of accesses to the object. Definitely worth trying this out, imo. > However, I don't understand why is the union object read before the result is ready. It's used as part of incremental indexing: the idea is that we create aggregates during ingestion time and we want those to be queryable even while ingestion is still ongoing. So the ingestion thread will be calling "aggregate" and a query thread will be calling "get" potentially simultaneously. On Wed, Jul 11, 2018 at 1:04 AM Eshcar Hillel <esh...@oath.com.invalid> wrote: > Thanks Gian, > This is also my understanding.But even in a single-writer-single-reader > scenario removing the lock can increase the throughput of accesses to the > object. > If the union is only used to produce the result at query time then > removing the lock would not affect ingestion throughput, but could decrease > query latency.However, I don't understand why is the union object read > before the result is ready. > On Tuesday, July 10, 2018, 8:13:36 PM GMT+3, Gian Merlino < > g...@apache.org> wrote: > > Hi Eshcar, > > To my knowledge, in the Druid Aggregator and BufferAggregator interfaces, > the main place where concurrency happens is that "aggregate" and "get" may > be called simultaneously during realtime ingestion. So if there would be a > benefit from improving concurrency it would probably end up in that area. > > On Tue, Jul 10, 2018 at 2:10 AM Eshcar Hillel <esh...@oath.com.invalid> > wrote: > > > Hi All, > > My name is Eshcar Hillel from Oath research. I'm currently working with > > Lee Rhodes on committing a new concurrent implementation of the theta > > sketch to the sketches-core library.I was wondering whether this > > implementation can help boost the union operation that is applied to > > multiple sketches at query time in druid.From what I see in the code the > > sketch aggregator uses the SynchronizedUnion implementation, which > > basically uses a lock at every single access (update/read) of the union > > operation. We believe a thread-safe implementation of the union operation > > can help decrease the inherent overhead of the lock. > > I will be happy to join the meeting today and briefly discuss this > option. > > Thanks,Eshcar > > > > > > >