avantgardnerio commented on PR #22571:
URL: https://github.com/apache/datafusion/pull/22571#issuecomment-4565373184

   > > The heap still only stores non-NULL values. This avoids making the TopK 
heap implementations handle NULL values directly.
   > 
   > Also, perhaps @avantgardnerio you can help review this PR as I think you 
are familiar with the original logic
   
   After some consideration, I think a separate bool is the way to go. The heap 
was generic over arrow::Primitive types, which does not make a solution 
allowing NULL straight-forward. We can probably make it generic over 
Option<arrow::Primitive>, but that's probably less performant? And it certainly 
stores more information than necessary.
   
   The boolean seems like a pragmatic decision given performance concerns , 
which is why we do all this craziness in the first place. One note though: in 
the hot loop it's probably better to just always write the value rather than a 
conditional branch. Since we write much more than read, the check could move to 
the read side.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to