yanngyoung opened a new issue, #5017: URL: https://github.com/apache/incubator-gluten/issues/5017
### Description We're trying to compare the performance between gluten and vanilla spark for iceberg datasource, and we found that for the same table scan stage, the input bytes shown a large difference. We wonder if it's the gluten improvement of filter pushdown, but we observed that the metrics for BatchScan shown that there were 2 type of input bytes of gluten: raw input bytes (which we understand as the raw input from disk, without decompressing and decoding) and input bytes total (which is the input vector size after decompressing and decoding) While for vanilla spark, it shows the input size which is comparable with the second size (input bytes total) of gluten, but gluten display the first one in the front of stage ui. Could we unify this with vanilla spark to avoid potential misunderstanding of users? <img width="1475" alt="image1" src="https://github.com/apache/incubator-gluten/assets/150935300/d8ae3ca3-c42d-4755-b6d3-1f04b35acd27"> <img width="1469" alt="image2" src="https://github.com/apache/incubator-gluten/assets/150935300/c5925e56-c397-4b90-b1c7-decee34bfdef"> <img width="724" alt="image3" src="https://github.com/apache/incubator-gluten/assets/150935300/e3393d7c-1ce1-4d41-8b78-ae9f27f23a46"> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
