leerho commented on issue #446: URL: https://github.com/apache/datasketches-java/issues/446#issuecomment-1581773089
Here is another, but very crude solution. If you just want a very rough idea of what the NDVs are per bin, you could do this: From the histogram information produced by the KLL sketch, you can compute the fractional density of each bin (fraction of total values including duplicates). Then with a parallel HLL sketch counting NDV of the entire stream you can compute the fractional number of duplicates in the stream. Finally, with the huge assumption that the duplicates are roughly uniformly distributed across the ranks, you can guess-timate the number of NDV in each bin. (I put this in not just for its humor value, but this is almost exactly what political pollsters do!) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
