tdunning commented on pull request #7076:
URL: https://github.com/apache/incubator-pinot/pull/7076#issuecomment-866066413


   Yes. Very helpful.
   
   I have a test that does roughly this. I keep the raw data for comparison, a
   t-digest that is never serialized, one that is serialized and deserialized
   for each sample and five that each get random subsets of data and then are
   merged at the end. The final analysis looks at discrepancies against exact
   answers and the three digested approaches.  Not done yet.
   
   
   On Mon, Jun 21, 2021 at 11:28 AM Xiaotian (Jackie) Jiang <
   ***@***.***> wrote:
   
   > @tdunning <https://github.com/tdunning>
   >
   > The random t-digest objects are created as following:
   >
   >   @Override
   >   Object getRandomRawValue(Random random) {
   >     TDigest tDigest = TDigest.createMergingDigest(COMPRESSION);
   >     tDigest.add(random.nextInt(MAX_VALUE));
   >     tDigest.add(random.nextInt(MAX_VALUE));
   >     return ObjectSerDeUtils.TDIGEST_SER_DE.serialize(tDigest);
   >   }
   >
   > In non-star-tree approach simply merges the t-digests in sequence without
   > ser-de; the star-tree approach will pre-aggregate t-digests and then stores
   > the serialized merged t-digests (each pre-aggregated t-digest might go
   > through multiple rounds of ser-de).
   >
   > Non-star-tree:
   >
   >   TDigest tDigest = deserialize(tDigest1);
   >   tDigest.add(deserialize(tDigest2));
   >   tDigest.add(deserialize(tDigest3));
   >   tDigest.add(deserialize(tDigest4));
   >
   > Star-tree:
   >
   >   TDigest tDigest = deserialize(tDigest1);
   >   tDigest.add(deserialize(tDigest2));
   >   byte[] mergedTDigest1 = serialize(tDigest);
   >
   >   tDigest = deserialize(tDigest3);
   >   tDigest.add(deserialize(tDigest4));
   >   byte[] mergedTDigest2 = serialize(tDigest);
   >
   >   tDigest = deserialize(mergedTDigest1);
   >   tDigest.add(deserialize(mergedTDigest2));
   >
   > Then we compare the result for each quantile from these 2 result t-digest
   > objects.
   >
   > Hope this can explain the test logic.
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > 
<https://github.com/apache/incubator-pinot/pull/7076#issuecomment-865252385>,
   > or unsubscribe
   > 
<https://github.com/notifications/unsubscribe-auth/AAB5E6XDZ2GJYY3ZE5JDYL3TT6AE7ANCNFSM47AL7QEA>
   > .
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to