expxiaoli commented on pull request #971: URL: https://github.com/apache/orc/pull/971#issuecomment-1000203266
@dongjoon-hyun In my perf test, speed is no regression for this patch. Here is scan time result for spark run query "insert overwrite table res_table select mapping['tag_a'] from src_table" "scan time" is spark metric in DataSourceScanExec class's doExecuteColumnar method , which only metric time for scan operator executor-memory | new ORC with this patch | old ORC 2500M | 37.2S | 34.6S 2250M | 36.5S | OOM 1100M | 30.2S | OOM 1000M | OOM | OOM Besides, this patch removes memory allocation for DynamicByteArray as well as memory copy from DynamicByteArray to primitive byte array, and do NOT add other time consuming logic. I think there is NO potential regression for speed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org