[ 
https://issues.apache.org/jira/browse/IMPALA-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472420#comment-16472420
 ] 

Tim Armstrong commented on IMPALA-7014:
---------------------------------------

I don't think we can justify spending 75ms of CPU time for each one of these 
even if they are occasionally useful. I personally think the reasonable options 
here are:
* Disable the slow symbolisation.
* Disable the slow symbolisation, invest some engineering effort into making 
symbolisation fast, then turn on the fast symbolisation.

Which is better depends on whether it is more impactful than other things we 
could do to improve supportability.

We've definitely wasted way more time chasing down performance problems from 
this than we've gained benefit from it. What I've generally found is that the 
backtrace itself is at best marginally useful, because these are all 
application-level errors that should be including all of the relevant 
information in the error message itself (or in a log message). E.g. in the case 
above there's only really one code path that produces that error, and the 
interesting thing is what file had the problem. There are definitely cases 
where it's a useful backup to figure out where an uninformative error message 
came from, but we should be fixing the error messages in those cases anyway.

I'd contrast this to Java, where stacktraces for exceptions are tremendously 
useful for runtime errors like NullPointerException that aren't debuggable 
without knowing where they came from in the code.

> Disable stacktrace symbolisation by default
> -------------------------------------------
>
>                 Key: IMPALA-7014
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7014
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Not Applicable
>            Reporter: Tim Armstrong
>            Assignee: Joe McDonnell
>            Priority: Critical
>
> We got burned by the code of producing stacktrace again with IMPALA-6996. I 
> did a quick investigation into this, based on the hypothesis that the 
> symbolisation was the expensive part, rather than getting the addresses. I 
> added a stopwatch to GetStackTrace() to measure the time in nanoseconds and 
> ran a test that produces a backtrace
> The first experiment was 
> {noformat}
> $ start-impala-cluster.py --impalad_args='--symbolize_stacktrace=true' && 
> impala-py.test tests/query_test/test_scanners.py -k codec
> I0511 09:45:11.897944 30904 debug-util.cc:283] stacktrace time: 75175573
> I0511 09:45:11.897956 30904 status.cc:125] File 
> 'hdfs://localhost:20500/test-warehouse/test_bad_compression_codec_308108.db/bad_codec/bad_codec.parquet'
>  uses an unsupported compression: 5000 for column 'id'.
>     @          0x18782ef  impala::Status::Status()
>     @          0x2cbe96f  
> impala::ParquetMetadataUtils::ValidateRowGroupColumn()
>     @          0x205f597  impala::BaseScalarColumnReader::Reset()
>     @          0x1feebe6  impala::HdfsParquetScanner::InitScalarColumns()
>     @          0x1fe6ff3  impala::HdfsParquetScanner::NextRowGroup()
>     @          0x1fe58d8  impala::HdfsParquetScanner::GetNextInternal()
>     @          0x1fe3eea  impala::HdfsParquetScanner::ProcessSplit()
>     @          0x1f6ba36  impala::HdfsScanNode::ProcessSplit()
>     @          0x1f6adc4  impala::HdfsScanNode::ScannerThread()
>     @          0x1f6a1c4  
> _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
>     @          0x1f6c2a6  
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
>     @          0x1bd3b1a  boost::function0<>::operator()()
>     @          0x1ebecd5  impala::Thread::SuperviseThread()
>     @          0x1ec6e71  boost::_bi::list5<>::operator()<>()
>     @          0x1ec6d95  boost::_bi::bind_t<>::operator()()
>     @          0x1ec6d58  boost::detail::thread_data<>::run()
>     @          0x31b3ada  thread_proxy
>     @     0x7f9be67d36ba  start_thread
>     @     0x7f9be650941d  clone
> {noformat}
> The stacktrace took 75ms, which is pretty bad! It would be worse on a 
> production system with more memory maps.
> The next experiment was to disable it:
> {noformat}
> start-impala-cluster.py --impalad_args='--symbolize_stacktrace=false' && 
> impala-py.test tests/query_test/test_scanners.py -k codec
> I0511 09:43:47.574185 29514 debug-util.cc:283] stacktrace time: 29528
> I0511 09:43:47.574193 29514 status.cc:125] File 
> 'hdfs://localhost:20500/test-warehouse/test_bad_compression_codec_cb5d0225.db/bad_codec/bad_codec.parquet'
>  uses an unsupported compression: 5000 for column 'id'.
>     @          0x18782ef
>     @          0x2cbe96f
>     @          0x205f597
>     @          0x1feebe6
>     @          0x1fe6ff3
>     @          0x1fe58d8
>     @          0x1fe3eea
>     @          0x1f6ba36
>     @          0x1f6adc4
>     @          0x1f6a1c4
>     @          0x1f6c2a6
>     @          0x1bd3b1a
>     @          0x1ebecd5
>     @          0x1ec6e71
>     @          0x1ec6d95
>     @          0x1ec6d58
>     @          0x31b3ada
>     @     0x7fbdcbdef6ba
>     @     0x7fbdcbb2541d
> {noformat}
> That's 2545x faster! If the addresses are in the statically linked binary, we 
> can use addrline to get back the line numbers:
> {noformat}
> $ addr2line -e be/build/latest/service/impalad 0x2cbe96f
> /home/tarmstrong/Impala/incubator-impala/be/src/exec/parquet-metadata-utils.cc:166
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to