[ 
https://issues.apache.org/jira/browse/DATASKETCHES-10?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17308661#comment-17308661
 ] 

Jan Prach commented on DATASKETCHES-10:
---------------------------------------

We can close this ticker, right?

I've seen dozens of papers, algorithms and even more implementations. Sadly I 
will probably end up forking one of the implementations yet again.

[~leerho] since we're talking - a question about ItemsSketch - have you 
considered Homem's paper? I found several references to it (like 
https://clickhouse.tech/docs/en/sql-reference/aggregate-functions/reference/topk/)
 but I have not found mentions of it on datasketches. I was just wondering why 
some libraries prefer one or the other.

[~veselyp] are you going to work more on streaming algorithms? Your t-digest 
fork has some nice stuff. There is similar (even bigger) collection at 
https://github.com/stanford-futuredata/msketch/tree/master/analysis. Each of 
them seem only being used once for one paper. Would it make sense to have one 
repo that could serve as a benchmark for all streaming algorithms? One could 
replicate it results in minutes or test her own implementation in an hour. It 
could get a lot of citations over time ;-)

> Double precision by default?
> ----------------------------
>
>                 Key: DATASKETCHES-10
>                 URL: https://issues.apache.org/jira/browse/DATASKETCHES-10
>             Project: Apache Datasketches
>          Issue Type: Improvement
>            Reporter: Jan Prach
>            Priority: Major
>
> Would it make sense to use double (instead of float) for all sketches by 
> default?
> It would take (less than 2x) more memory, have same speed, have twice the 
> storage. Or even the same storage if one is fine with the flaot precision. 
> Most importantly it would be far more useful.
> I' trying to build generic profiler. In the first simple dataset there were a 
> couple of dateĀ  and timestamp columns. The obvious choice is to convert them 
> to epoch seconds. Full day of time with weird messages only to realize thatĀ 
> KllFloatsSketch, ReqSketch, etc. are all based on floats. That means 24 bit 
> precision. But epoch seconds today are 31 bit numbers.
> Why not always double?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to