leerho commented on issue #7187: Improve topN algorithm
URL: 
https://github.com/apache/incubator-druid/issues/7187#issuecomment-480349933
 
 
   @peferron Thank you for your thoughtful comments.  Clearly we have to leave 
it up to an informed user to decide.  And all we can do is do our best to make 
sure that he/she is informed.
   
   ----
   
   Bouncing back to the top of this thread, we are developing a new sketch that 
we are tentatively calling "Frequent Unique Nodes" (FUN).  
   
   Suppose you have a stream that contains pairs {IP address, UserID}, and you 
wish to identify the IP addresses that have the largest number of unique users. 
 In this context think of a large graph where the IP addresses and users are 
nodes in the graph.  Consider Node1 = IP and Node2 = ID, then we want to 
identify the Node1s that have the largest number of unique Node2s.  Conversely, 
it might also be interesting to identify the Node2s (IDs) that have the largest 
number of unique Node1s (IPs).   Conceptually, this can also be extended to 
more that just 2 nodes (although don't go nuts with this!).
   
   With this new sketch you will be able to perform these types of queries and 
have some guarantees of accuracy as well.
   
   If this is of interest, please let me know, as we could use your help in 
characterizing and performance testing of this, if possible.
   
   Lee.
   
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to