Hey Eschar, I see Himanshu wrote a note in https://github.com/apache/incubator-datasketches-java/issues/263, and I added a little bit of extra info as well. Hope it helps!
On Tue, Jul 2, 2019 at 7:06 AM Eshcar Hillel <esh...@verizonmedia.com.invalid> wrote: > I did some thinking and alternative 2 would not allow supporting a > scenario of single-write-multiple-readers in druid's incremental index, > which is the common case.So this leaves choosing between alternative 1 and > 3.Can anyone point out advantages of having a union API to answer queries > rather than a sketch? The only reason I can think of is being backward > compatible with the current implementation, but this might be a good enough > reason. On Sunday, June 30, 2019, 1:13:30 PM GMT+3, Eshcar Hillel < > esh...@verizonmedia.com> wrote: > > Hi Everyone, > As some of you may recall a year ago we had a conversation over the > mailing list regarding the synchronization of sketches > https://lists.apache.org/thread.html/9899aa790a7eb561ab66f47b35c8f66ffe695432719251351339521a@%3Cdev.druid.apache.org%3E.Currently, > the implementation of concurrent theta sketch is committed to the > datasketches library.Details of the design and API can be found here > https://datasketches.github.io/docs/Theta/ConcurrentThetaSketch.html. > We would like to continue with implementing a concurrent union operation. > For this I have opened an issue suggesting 3 design alternativeshttps:// > github.com/apache/incubator-datasketches-java/issues/263. > > With Druid being one of the main users of data sketches, and specifically > the union set operation, the input of the Druid community is valuable.The > advantages of a concurrent union implementation is that it is thread safe, > namely allows concurrent reads and updates of the union object. The > application does not need to wrap the union implementation with a > synchronized call as currently done in > https://github.com/apache/incubator-druid/blob/master/extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/theta/SketchAggregator.java.The > core concept of a concurrent implementation is separating the object into > local objects and shared object, where the data flows from local to > shared.The 3 design alternative suggest different separation of read and > write accesses:1) write only to local (union) read only from shared > (union)2) write and read only from local (union)3) write only to local > (union) read only from shared (sketch) > I would greatly appreciate if you can give your feedback in the issue I > opened https://github.com/apache/incubator-datasketches-java/issues/263 so > we can make the best decision (also) for Druid. > Thanks,Eshcar