avantgardnerio commented on PR #22571: URL: https://github.com/apache/datafusion/pull/22571#issuecomment-4565373184
> > The heap still only stores non-NULL values. This avoids making the TopK heap implementations handle NULL values directly. > > Also, perhaps @avantgardnerio you can help review this PR as I think you are familiar with the original logic After some consideration, I think a separate bool is the way to go. The heap was generic over arrow::Primitive types, which does not make a solution allowing NULL straight-forward. We can probably make it generic over Option<arrow::Primitive>, but that's probably less performant? And it certainly stores more information than necessary. The boolean seems like a pragmatic decision given performance concerns , which is why we do all this craziness in the first place. One note though: in the hot loop it's probably better to just always write the value rather than a conditional branch. Since we write much more than read, the check could move to the read side. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
