[jira] [Commented] (DATASKETCHES-10) Double precision by default?

Jan Prach (Jira) Fri, 19 Mar 2021 10:38:09 -0700


    [ 
https://issues.apache.org/jira/browse/DATASKETCHES-10?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17305073#comment-17305073
 ]


Jan Prach commented on DATASKETCHES-10:
---------------------------------------

Yes, I understand there may be use cases for that.

The storage is easily solvable problem though - for example kll sketch could 
have "byte[] toByteArray(Boolean floatPrecision)". The class in memory would 
take considerable more memory though. At least unless there would be two 
implementations for both precision (which would make jar bigger but possible).

Yes, the accuracy of float is more then enough for approximate algorithm. It 
works just fine if the values are certain scale and centered around zero. It 
may be true for something like server latency. For a restricted domain - like a 
metrics server - it is (probably) fine. For more generic data - any king of 
database, stream processing, batch data processing - anything more generic is 
more problematic.

But sure, if you don't like it, don't do it.

I'm going to check out the OrigQuantilesSketch. Thanks for the tip!

> Double precision by default?
> ----------------------------
>
>                 Key: DATASKETCHES-10
>                 URL: https://issues.apache.org/jira/browse/DATASKETCHES-10
>             Project: Apache Datasketches
>          Issue Type: Improvement
>            Reporter: Jan Prach
>            Priority: Major
>
> Would it make sense to use double (instead of float) for all sketches by 
> default?
> It would take (less than 2x) more memory, have same speed, have twice the 
> storage. Or even the same storage if one is fine with the flaot precision. 
> Most importantly it would be far more useful.
> I' trying to build generic profiler. In the first simple dataset there were a 
> couple of date  and timestamp columns. The obvious choice is to convert them 
> to epoch seconds. Full day of time with weird messages only to realize that 
> KllFloatsSketch, ReqSketch, etc. are all based on floats. That means 24 bit 
> precision. But epoch seconds today are 31 bit numbers.
> Why not always double?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (DATASKETCHES-10) Double precision by default?

Reply via email to