[GitHub] [orc] expxiaoli commented on pull request #971: ORC-1060: reduce memory usage when vectorized reading dictionary string encoding columns

GitBox Sun, 19 Dec 2021 23:29:36 -0800


expxiaoli commented on pull request #971:
URL: https://github.com/apache/orc/pull/971#issuecomment-997666048



   Here is perf test.
   I create a orc table named src_table with map<string, string> column named 
mapping, which stripe's string dictionary could occupy 466M memory.
   Then I run a spark query to read this column:
   insert overwrite table res_table select mapping['tag_a'] from src_table;
   
   With old orc lib, only if I set executor-memory to equal or larger than 
2500M, the query could run successfully. Otherwise the query will fail with OOM 
exception. Here is perf result with MAT tool when I run query with 2500M 
executor-memory
   
![内存占用情况_DynamicByteArray](https://user-images.githubusercontent.com/2948397/146728253-95e6530f-58d0-4ee0-97b3-e9d252cf2084.png)
   
   With orc lib with this new patch, the query could run successfully when 
executor-memory is decreased to 1200M. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [orc] expxiaoli commented on pull request #971: ORC-1060: reduce memory usage when vectorized reading dictionary string encoding columns

Reply via email to