I did some thinking and alternative 2 would not allow supporting a scenario of 
single-write-multiple-readers in druid's incremental index, which is the common 
case.So this leaves choosing between alternative 1 and 3.Can anyone point out 
advantages of having a union API to answer queries rather than a sketch? The 
only reason I can think of is being backward compatible with the current 
implementation, but this might be a good enough reason.     On Sunday, June 30, 
2019, 1:13:30 PM GMT+3, Eshcar Hillel <esh...@verizonmedia.com> wrote:  
 
 Hi Everyone,
As some of you may recall a year ago we had a conversation over the mailing 
list regarding the synchronization of sketches 
https://lists.apache.org/thread.html/9899aa790a7eb561ab66f47b35c8f66ffe695432719251351339521a@%3Cdev.druid.apache.org%3E.Currently,
 the implementation of concurrent theta sketch is committed to the datasketches 
library.Details of the design and API can be found here 
https://datasketches.github.io/docs/Theta/ConcurrentThetaSketch.html.
We would like to continue with implementing a concurrent union operation. For 
this I have opened an issue suggesting 3 design 
alternativeshttps://github.com/apache/incubator-datasketches-java/issues/263.

With Druid being one of the main users of data sketches, and specifically the 
union set operation, the input of the Druid community is valuable.The 
advantages of a concurrent union implementation is that it is thread safe, 
namely allows concurrent reads and updates of the union object. The application 
does not need to wrap the union implementation with a synchronized call as 
currently done in 
https://github.com/apache/incubator-druid/blob/master/extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/theta/SketchAggregator.java.The
 core concept of a concurrent implementation is separating the object into 
local objects and shared object, where the data flows from local to shared.The 
3 design alternative suggest different separation of read and write accesses:1) 
write only to local (union) read only from shared (union)2) write and read only 
from local (union)3) write only to local (union) read only from shared (sketch)
I would greatly appreciate if you can give your feedback in the issue I opened 
https://github.com/apache/incubator-datasketches-java/issues/263 so we can make 
the best decision (also) for Druid.
Thanks,Eshcar  

Reply via email to