Yizhou-Yang commented on PR #5007:
URL: 
https://github.com/apache/incubator-gluten/pull/5007#issuecomment-3822702483

   @WangGuangxin 
   hi, just to double-check why this pr is not merged into main branch, since I 
would like to support approx_percentile... 
   
   Based on the results of my debug:
   1)  The signature is different, which is why you need this rewrite rule to 
fit into gluten
   <img width="1165" height="391" alt="Clipboard_Screenshot_1769764660" 
src="https://github.com/user-attachments/assets/dec5b7fd-f804-4a60-bb1b-b2e184914b03";
 />
   <img width="1213" height="168" alt="Clipboard_Screenshot_1769764751" 
src="https://github.com/user-attachments/assets/aa1d4957-0bde-43cd-81bb-4c4060fde510";
 />
   
   2) Velox uses KLL algorithm, which is more space-efficient but not 
deterministic, but Spark uses GK algorithm, which is slower but deterministic. 
The results might differ if you choose to use the current velox implementation.
   
   Is there any other differences which I have not yet noticed?
   
   If my observation is correct, can I take over your existing work, make my 
changes to it, and implement a velox native SparkApproximatePercentileAggregate 
function in the velox project (based on spark's GK algorithm, and I will do the 
work to translate it into native), then adapt it to Gluten? I will add a gluten 
config to make sure that the user always have to option to choose between GK, 
VLL and fallback.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to