[
https://issues.apache.org/jira/browse/DATASKETCHES-10?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17309350#comment-17309350
]
Pavel Vesely commented on DATASKETCHES-10:
------------------------------------------
Jan,
yes, I'm working on streaming algorithms, currently mainly on geometric
streaming, although mostly from the theoretical (mathematical) perspective ;)
I'm aware of the MomentsSketch work (though I haven't seen the msketch repo).
The main difference to ReqSketch and t-digest is that it doesn't give better
accuracy for the tails, so we didn't include it in our comparison. (Similarly
as t-digest, it's a heuristic suitable for processing data that actually come
from some "nice" input distribution.)
As for a repo for all streaming algorithms (I suppose for a particular problem
like quantile estimation or counting distinct items): it makes some sense,
although one would have to prepare many different benchmark tests as each
algorithm may be suitable for a certain scenario. Looks like a nice project for
a student :)
> Double precision by default?
> ----------------------------
>
> Key: DATASKETCHES-10
> URL: https://issues.apache.org/jira/browse/DATASKETCHES-10
> Project: Apache Datasketches
> Issue Type: Improvement
> Reporter: Jan Prach
> Priority: Major
>
> Would it make sense to use double (instead of float) for all sketches by
> default?
> It would take (less than 2x) more memory, have same speed, have twice the
> storage. Or even the same storage if one is fine with the flaot precision.
> Most importantly it would be far more useful.
> I' trying to build generic profiler. In the first simple dataset there were a
> couple of date and timestamp columns. The obvious choice is to convert them
> to epoch seconds. Full day of time with weird messages only to realize that
> KllFloatsSketch, ReqSketch, etc. are all based on floats. That means 24 bit
> precision. But epoch seconds today are 31 bit numbers.
> Why not always double?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]