[GitHub] [arrow-datafusion] realno commented on issue #1486: Add median, std, and corr functions

GitBox Wed, 19 Jan 2022 11:32:29 -0800


realno commented on issue #1486:
URL: 
https://github.com/apache/arrow-datafusion/issues/1486#issuecomment-1016800036



   > @realno you can take this with a grain of salt as I am new to this.
   > 
   > My thinking is that I would prefer to see the exact median implementation 
before having an approximate (i.e the approximate would be an add-on feature). 
I could be wrong but I believe datafusion had `DISTINCT` before 
`approx_distinct`.
   > 
   > Regarding the implementation - I thought that we would be able to use 
existing arrow compute kernels for this and not have to re-implement existing 
functionality:
   > 
   > * sort: 
https://docs.rs/arrow/latest/arrow/compute/kernels/sort/fn.sort.html
   > * length: 
https://docs.rs/arrow/latest/arrow/array/trait.Array.html#method.len
   > * value: 
https://docs.rs/arrow/latest/arrow/array/struct.PrimitiveArray.html#method.value
   > 
   > I suppose this would be somewhere between your Option 1 and Option 2.
   > 
   > i definitely defer to @alamb though.
   
   Thanks for the comments @matthewmturner . I am also new and wouldn't call 
myself database internal expert :) Yes we have all the functionality ready, the 
complication is what's the best/most efficient way to implement this. I 
definitely want to hear more opinions on this. 
   
   Do you think it worth having a approximation to unblock the perf benchmark 
work? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] realno commented on issue #1486: Add median, std, and corr functions

Reply via email to