[GitHub] [orc] expxiaoli commented on pull request #971: ORC-1060: reduce memory usage when vectorized reading dictionary string encoding columns

GitBox Thu, 23 Dec 2021 02:33:18 -0800


expxiaoli commented on pull request #971:
URL: https://github.com/apache/orc/pull/971#issuecomment-1000203266



   @dongjoon-hyun In my perf test, speed is no regression for this patch. Here 
is scan time result for spark run query "insert overwrite table res_table 
select mapping['tag_a'] from src_table"
   
   "scan time" is spark metric in DataSourceScanExec class's doExecuteColumnar 
method , which only metric time for scan operator
   
   executor-memory | new ORC with this patch | old ORC
   2500M                   | 37.2S                                 | 34.6S
   2250M                   | 36.5S                                 | OOM
   1100M                    | 30.2S                                 | OOM
   1000M                    | OOM                                 | OOM
   
   Besides, this patch removes memory allocation for DynamicByteArray as well 
as memory copy from DynamicByteArray to primitive byte array, and do NOT add 
other time consuming logic. I think there is NO potential regression for speed.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [orc] expxiaoli commented on pull request #971: ORC-1060: reduce memory usage when vectorized reading dictionary string encoding columns

Reply via email to