Xinyu Zeng created ORC-1232:
-------------------------------
Summary: Disable metrics collector by default
Key: ORC-1232
URL: https://issues.apache.org/jira/browse/ORC-1232
Project: ORC
Issue Type: Improvement
Reporter: Xinyu Zeng
ORC-961 introduced a metrics collector for the reader. However, it may affect
the performance of reading ORC files. It may be helpful to disable it as
default.
Reproducable experiment result:
Alibaba Cloud
[ecs.s6-c1m4.xlarge|https://help.aliyun.com/document_detail/25378.html#s6],
running Ubuntu 20.04, ESSD PL1 40GB
The original file is 4.1GB csv file with generated string with some degree of
repetiveness (the value of one column follows a zipfian distribution). The ORC
file with dictionary encoding and no block compression is 319MB.
Time of running orc-scan with metrics enabled: 7.5s
Time of running orc-scan with metrics disabled: 1.5s
The action of disable is implemented by adding
readerOpts.setReaderMetrics(nullptr);
after
https://github.com/apache/orc/blob/02e48107b36b8ed868797dadcd7355a632519d48/tools/src/FileScan.cc#L26
--
This message was sent by Atlassian Jira
(v8.20.10#820010)