expxiaoli commented on pull request #971: URL: https://github.com/apache/orc/pull/971#issuecomment-997666048
Here is perf test. I create a orc table named src_table with map<string, string> column named mapping, which stripe's string dictionary could occupy 466M memory. Then I run a spark query to read this column: insert overwrite table res_table select mapping['tag_a'] from src_table; With old orc lib, only if I set executor-memory to equal or larger than 2500M, the query could run successfully. Otherwise the query will fail with OOM exception. Here is perf result with MAT tool when I run query with 2500M executor-memory  With orc lib with this new patch, the query could run successfully when executor-memory is decreased to 1200M. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org