leerho edited a comment on issue #6865: Densify swapped hll buffer
URL: https://github.com/apache/incubator-druid/pull/6865#issuecomment-462086552
 
 
   @gianm 
   
   Looking for guidance here...
   
   I have spent the past few days studying the Druid-HLL code and have 
uncovered at least a half-dozen serious bugs and haven't even started on the 
merge logic, which from a brief look also has very serious problems.  A number 
of these problems are interconnected, so you can't just fix one at a time.   
   
   This code needs to be redesigned from scratch.  I'm not sure I want to 
undertake this, but if I were, I would insist on some major changes. The API 
will have some required changes:  The biggest one is removing the ability for 
users to specify the hash function.  Any users that are currently doing that, 
using a different hash function and have historical stored images may not be 
able to use their history.  
   
   I am considering 2 strategies:
   1. **Rewrite existing code**: Many current internal methods that are now 
public will become private or package-private: e.g., no public access to 
internals such as the overflow registers, getNumNonZeroRegisters, 
getHeaderBytes, getPayloadBytePosition, setVersion, etc.  I may also insist on 
a merge class rather than a merge method ( fold() ).  
   
   The storage would be a little larger (from 1031 bytes to perhaps 1080 
bytes).  And merge performance may be a bit slower.  It could still be backward 
compatible, but old images will still propagate errors into the new design and 
there is nothing that can be done about that.  Users that record their history 
with the new design will see much better error performance.  
   
   2. **API Wrapper to the DS-HLL sketch**.  This will require some changes to 
the DS-HLL code, but may be easier to do and would eliminate a lot of code 
duplication.  Basically I have to allow for the non-seeded hash function, adapt 
to the big-endian fields used by the Druid-HLL, create a merge function that 
can read the old Druid-HLL images, etc.  
   
   The advantage of this is (hopefully) a single code-base for the sketch 
internals with two different wrappers, one for the old Druid-HLL users, and a 
new full-function API wrapper for the DS-HLL customers.
   
   Whatever, the new design would have to be extensively characterized and 
tested.
   
   Thoughts?
   
   
   
   
   
   
   
   
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to