Re: Question about sketches aggregation in druid

Eshcar Hillel Sun, 15 Jul 2018 00:11:34 -0700

 Apologies, I must be missing something very basic in how incremental indexing 
is working.A sketch is by itself an aggregator - it can absorb millions of 
updates before it exceeds its space limit or is flushed to disk.
I assumed the ingestion thread aggregates data in multiple sketches in 
parallel, then at query time a union operation is invoked to merge relevant 
sketches based on the attributes of the query, and when the union is completed 
its result is returned to the user. But in such scenario there is no need to 
call get before the union is completed.
This means there is another scenario where union is used and can be queried 
while in the process of executing the merge. Is this to maintain some in-memory 
hierarchy of aggregations? or for creating the snapshots that are flushed to 
disk?
A better understanding of the use case will help in presenting a better 
thread-safe solution.
Thanks,Eshcar


    On Wednesday, July 11, 2018, 7:51:24 PM GMT+3, Gian Merlino 
<[email protected]> wrote:  
 
 Hi Eshcar,

> But even in a single-writer-single-reader scenario removing the lock can
increase the throughput of accesses to the object.

Definitely worth trying this out, imo.

> However, I don't understand why is the union object read before the
result is ready.

It's used as part of incremental indexing: the idea is that we create
aggregates during ingestion time and we want those to be queryable even
while ingestion is still ongoing. So the ingestion thread will be calling
"aggregate" and a query thread will be calling "get" potentially
simultaneously.

On Wed, Jul 11, 2018 at 1:04 AM Eshcar Hillel <[email protected]>
wrote:

>  Thanks Gian,
> This is also my understanding.But even in a single-writer-single-reader
> scenario removing the lock can increase the throughput of accesses to the
> object.
> If the union is only used to produce the result at query time then
> removing the lock would not affect ingestion throughput, but could decrease
> query latency.However, I don't understand why is the union object read
> before the result is ready.
>    On Tuesday, July 10, 2018, 8:13:36 PM GMT+3, Gian Merlino <
> [email protected]> wrote:
>
>  Hi Eshcar,
>
> To my knowledge, in the Druid Aggregator and BufferAggregator interfaces,
> the main place where concurrency happens is that "aggregate" and "get" may
> be called simultaneously during realtime ingestion. So if there would be a
> benefit from improving concurrency it would probably end up in that area.
>
> On Tue, Jul 10, 2018 at 2:10 AM Eshcar Hillel <[email protected]>
> wrote:
>
> > Hi All,
> > My name is Eshcar Hillel from Oath research. I'm currently working with
> > Lee Rhodes on committing a new concurrent implementation of the theta
> > sketch to the sketches-core library.I was wondering whether this
> > implementation can help boost the union operation that is applied to
> > multiple sketches at query time in druid.From what I see in the code the
> > sketch aggregator uses the SynchronizedUnion implementation, which
> > basically uses a lock at every single access (update/read) of the union
> > operation. We believe a thread-safe implementation of the union operation
> > can help decrease the inherent overhead of the lock.
> > I will be happy to join the meeting today and briefly discuss this
> option.
> > Thanks,Eshcar
> >
> >
> >
>

Re: Question about sketches aggregation in druid

Reply via email to