yanngyoung opened a new issue, #5017:
URL: https://github.com/apache/incubator-gluten/issues/5017

   ### Description
   
   We're trying to compare the performance between gluten and vanilla spark for 
iceberg datasource, and we found that for the same table scan stage, the input 
bytes shown a large difference. We wonder if it's the gluten improvement of 
filter pushdown, but we observed that the metrics for BatchScan shown that 
there were 2 type of input bytes of gluten: raw input bytes (which we 
understand as the raw input from disk, without decompressing and decoding) and 
input bytes total (which is the input vector size after decompressing and 
decoding) 
   
   While for vanilla spark, it shows the input size which is comparable with 
the second size (input bytes total) of gluten, but gluten display the first one 
in the front of stage ui. Could we unify this with vanilla spark to avoid 
potential misunderstanding of users?
   
   <img width="1475" alt="image1" 
src="https://github.com/apache/incubator-gluten/assets/150935300/d8ae3ca3-c42d-4755-b6d3-1f04b35acd27";>
   <img width="1469" alt="image2" 
src="https://github.com/apache/incubator-gluten/assets/150935300/c5925e56-c397-4b90-b1c7-decee34bfdef";>
   
   <img width="724" alt="image3" 
src="https://github.com/apache/incubator-gluten/assets/150935300/e3393d7c-1ce1-4d41-8b78-ae9f27f23a46";>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to