tdunning commented on issue #409:
URL: 
https://github.com/apache/datasketches-cpp/issues/409#issuecomment-1868164562

   The compression on the inputs is not necessary. It may have a positive
   effect on accuracy in a few cases, but I strongly doubt the effect will be
   important. In practice, most of the digests being merged will previously
   have been persisted and thus will have already been compressed.
   
   Just allocate a buffer big enough for all the live centroids in all of the
   merging digests, copy in the centroids and sort/compress.  A final copy to
   a trimmed buffer is a nice touch, but only necessary if the intermediate
   buffer is massive.
   
   An alternative would be to simply pour the centroids in, one digest at a
   time. Putting in part of a digest at a time will lead to situations where
   the invariant may be violated transiently until you get the rest of the
   data in. I haven't analyzed that scenario in detail so I can't say that it
   will be completely safe. The case where you add an entire t-digest at a
   time has been analyzed, however, and is safe. This extends to adding
   multiple complete digests.
   
   I like and sympathize with keeping arguments const and might well have
   preferred that approach were I to do it over again.
   
   
   On Fri, Dec 22, 2023 at 2:23 PM Alexander Saydakov ***@***.***>
   wrote:
   
   > I am looking at the code to merge t-digests. I don't like that a modifying
   > operation compress() is called on the input t-digests. I wonder whether
   > this is avoidable. I would think this is done for simplicity to avoid
   > dealing with unmerged buffer in the incoming t-digests, however the
   > compress() method forces compression even if the buffer is empty.
   >
   > There is a practice in C++ to pass const reference to the input object.
   > Having a requirement of a mutable input is quite ugly. And forcing
   > compression unnecessarily is even worse. I am not sure how to deal with
   > this.
   >
   > —
   > Reply to this email directly, view it on GitHub
   > 
<https://github.com/apache/datasketches-cpp/issues/409#issuecomment-1868105273>,
   > or unsubscribe
   > 
<https://github.com/notifications/unsubscribe-auth/AAB5E6RT7HJBQ7GLML3F6P3YKYB5BAVCNFSM6AAAAABANDLXNOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGEYDKMRXGM>
   > .
   > You are receiving this because you commented.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to