[
https://issues.apache.org/jira/browse/IMPALA-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548433#comment-16548433
]
ASF subversion and git services commented on IMPALA-7014:
---------------------------------------------------------
Commit 980076117f43d2189b5fc9484ef0c1c54c2c18c1 in impala's branch
refs/heads/master from [~zoram]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=9800761 ]
IMPALA-7014: Disable stacktrace symbolisation by default
Stacktrace symbolization has been shown to be 2500x slower
compared to just printing the un-symbolized one.
This has burned us a few times now, so let's disable it by
default.
Change-Id: If3af209890ccc242beb742145c63eb6836d4bfbb
Reviewed-on: http://gerrit.cloudera.org:8080/10964
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Disable stacktrace symbolisation by default
> -------------------------------------------
>
> Key: IMPALA-7014
> URL: https://issues.apache.org/jira/browse/IMPALA-7014
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Affects Versions: Not Applicable
> Reporter: Tim Armstrong
> Assignee: Zoram Thanga
> Priority: Critical
>
> We got burned by the code of producing stacktrace again with IMPALA-6996. I
> did a quick investigation into this, based on the hypothesis that the
> symbolisation was the expensive part, rather than getting the addresses. I
> added a stopwatch to GetStackTrace() to measure the time in nanoseconds and
> ran a test that produces a backtrace
> The first experiment was
> {noformat}
> $ start-impala-cluster.py --impalad_args='--symbolize_stacktrace=true' &&
> impala-py.test tests/query_test/test_scanners.py -k codec
> I0511 09:45:11.897944 30904 debug-util.cc:283] stacktrace time: 75175573
> I0511 09:45:11.897956 30904 status.cc:125] File
> 'hdfs://localhost:20500/test-warehouse/test_bad_compression_codec_308108.db/bad_codec/bad_codec.parquet'
> uses an unsupported compression: 5000 for column 'id'.
> @ 0x18782ef impala::Status::Status()
> @ 0x2cbe96f
> impala::ParquetMetadataUtils::ValidateRowGroupColumn()
> @ 0x205f597 impala::BaseScalarColumnReader::Reset()
> @ 0x1feebe6 impala::HdfsParquetScanner::InitScalarColumns()
> @ 0x1fe6ff3 impala::HdfsParquetScanner::NextRowGroup()
> @ 0x1fe58d8 impala::HdfsParquetScanner::GetNextInternal()
> @ 0x1fe3eea impala::HdfsParquetScanner::ProcessSplit()
> @ 0x1f6ba36 impala::HdfsScanNode::ProcessSplit()
> @ 0x1f6adc4 impala::HdfsScanNode::ScannerThread()
> @ 0x1f6a1c4
> _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
> @ 0x1f6c2a6
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
> @ 0x1bd3b1a boost::function0<>::operator()()
> @ 0x1ebecd5 impala::Thread::SuperviseThread()
> @ 0x1ec6e71 boost::_bi::list5<>::operator()<>()
> @ 0x1ec6d95 boost::_bi::bind_t<>::operator()()
> @ 0x1ec6d58 boost::detail::thread_data<>::run()
> @ 0x31b3ada thread_proxy
> @ 0x7f9be67d36ba start_thread
> @ 0x7f9be650941d clone
> {noformat}
> The stacktrace took 75ms, which is pretty bad! It would be worse on a
> production system with more memory maps.
> The next experiment was to disable it:
> {noformat}
> start-impala-cluster.py --impalad_args='--symbolize_stacktrace=false' &&
> impala-py.test tests/query_test/test_scanners.py -k codec
> I0511 09:43:47.574185 29514 debug-util.cc:283] stacktrace time: 29528
> I0511 09:43:47.574193 29514 status.cc:125] File
> 'hdfs://localhost:20500/test-warehouse/test_bad_compression_codec_cb5d0225.db/bad_codec/bad_codec.parquet'
> uses an unsupported compression: 5000 for column 'id'.
> @ 0x18782ef
> @ 0x2cbe96f
> @ 0x205f597
> @ 0x1feebe6
> @ 0x1fe6ff3
> @ 0x1fe58d8
> @ 0x1fe3eea
> @ 0x1f6ba36
> @ 0x1f6adc4
> @ 0x1f6a1c4
> @ 0x1f6c2a6
> @ 0x1bd3b1a
> @ 0x1ebecd5
> @ 0x1ec6e71
> @ 0x1ec6d95
> @ 0x1ec6d58
> @ 0x31b3ada
> @ 0x7fbdcbdef6ba
> @ 0x7fbdcbb2541d
> {noformat}
> That's 2545x faster! If the addresses are in the statically linked binary, we
> can use addrline to get back the line numbers:
> {noformat}
> $ addr2line -e be/build/latest/service/impalad 0x2cbe96f
> /home/tarmstrong/Impala/incubator-impala/be/src/exec/parquet-metadata-utils.cc:166
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]