[GitHub] [iceberg] ConeyLiu opened a new pull request #3249: Optimized spark vectorized read parquet decimal

GitBox Fri, 08 Oct 2021 04:52:19 -0700


ConeyLiu opened a new pull request #3249:
URL: https://github.com/apache/iceberg/pull/3249



   Arrow use 16 bytes for all decimal vector, however, the data could be stored 
as int/long in parquet file for different precision decimal data. We only need 
to use the int/long arrow vector for int/long backed decimal data. This could 
improve performance a lot when we do a full table scan on the store_sales 
table(1TB data scale). 
   
   
![image](https://user-images.githubusercontent.com/12733256/136551929-ad5191b1-fa04-4c3e-ab4a-7b9c82489958.png)
   
   The existed UT should cover the vectorized read case. I could add the extra 
UTs if needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] ConeyLiu opened a new pull request #3249: Optimized spark vectorized read parquet decimal

Reply via email to