geonove commented on issue #457:
URL:
https://github.com/apache/datasketches-cpp/issues/457#issuecomment-3757012342
I’ve been working on the characterization of `DDSketch` and comparing it
against `t-digest` and `REQ sketch`. After digging into the literature and
implementations, one compelling reason to consider adding `DDSketch` to the
library is that it provides accuracy guarantees that the others do not.
Specifically, `DDSketch` guarantees bounded `relative error in value space`. In
contrast, `REQ sketch` provides guarantees in `relative rank error`, and
`t-digest` has no formal worst-case bounds.
The figure below shows the `relative quantile error` as a function of
`stream size` for the three sketches:
<img width="2494" height="1292" alt="Image"
src="https://github.com/user-attachments/assets/f0a526d7-9321-4b46-8d41-2eb9ca261da6"
/>
Since `REQ sketch` seems to outperform `DDSketch` for sufficiently large
streams, I'd like to compare their memory footprints as well, to get a full
picture. The memory profiling job takes a while to run (hours or maybe days),
even after the
[optimization](https://github.com/apache/datasketches-characterization/pull/95)
I did. Does anyone have the data used to produce the [memory
plots](https://private-user-images.githubusercontent.com/13126686/311932751-8cd63172-4524-4e1b-a84c-0efc127fd6be.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Njg1MTM0NTYsIm5iZiI6MTc2ODUxMzE1NiwicGF0aCI6Ii8xMzEyNjY4Ni8zMTE5MzI3NTEtOGNkNjMxNzItNDUyNC00ZTFiLWE4NGMtMGVmYzEyN2ZkNmJlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNjAxMTUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjYwMTE1VDIxMzkxNlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbm
F0dXJlPTM5ZThhMjlkZTVmZmM2YTlhOWI3ZDQ5OGY3ZDZjMTg2MWIwZjRjNjA4MGU2ZmUyNWY1YTE3NjE2ZGY0Y2IyN2QmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.JnCkGVIk0I0Ajzl0TYGEw4s6cSszDuIS-Aa9msUaMb0)?
I didn’t see them in the `datasketches-characterization` repo.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]