[ 
https://issues.apache.org/jira/browse/IMPALA-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18080440#comment-18080440
 ] 

Joe McDonnell commented on IMPALA-14973:
----------------------------------------

Immediately before this in the logs, the query fails to open a file:
{noformat}
I0XXX 12:34:21.731873   241 krpc-data-stream-mgr.cc:427] Reduced stream ID 
cache from 425 items, to 388, eviction took: 0
hdfsOpenFile(s3a://path/to/table/filename): 
FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;)
 error:
FileNotFoundException: No such file or directory: s3a://path/to/table/filename
java.io.FileNotFoundException: No such file or directory: 
s3a://path/to/table/filename
 at 
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:4079)
 at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3930)
 at 
org.apache.hadoop.fs.s3a.S3AFileSystem.extractOrFetchSimpleFileStatus(S3AFileSystem.java:5645)
 at 
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$executeOpen$4(S3AFileSystem.java:1808)
 at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:543)
 at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:524)
 at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:445)
 at org.apache.hadoop.fs.s3a.S3AFileSystem.executeOpen(S3AFileSystem.java:1806)
 at org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:1780)
I0XXX 12:34:25.239046   477 status.cc:71] Disk I/O error on 
hostname.svc.cluster.local:27010: Failed to open HDFS file 
s3a://path/to/table/filename
Error(2): No such file or directory
Root cause: FileNotFoundException: No such file or directory: 
s3a://path/to/table/filename
    @           0xf1f22b
    @          0x2175e03
    @          0x2176dc7
    @          0x2178f7b
    @          0x1b13997
    @          0x23bbcb7
    @     0xffffa0b7d8b7
    @     0xffff9eca8afb
Minidump in thread [6428]exec-finstance 
(finst:084b347a927565de:be826ca200000037) running query 
084b347a927565de:be826ca200000000, fragment instance 
084b347a927565de:be826ca200000037
Minidump in thread [6428]exec-finstance 
(finst:084b347a927565de:be826ca200000037) running query 
084b347a927565de:be826ca200000000, fragment instance 
084b347a927565de:be826ca200000037
Wrote minidump to 
/opt/impala/logs/minidumps/impalad/a4d970fd-0e83-4883-f136ccb1-65e5ac7d.dmp{noformat}

> Crash when opening a ScannerContext::Stream on an Iceberg table
> ---------------------------------------------------------------
>
>                 Key: IMPALA-14973
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14973
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 5.0.0
>            Reporter: Joe McDonnell
>            Priority: Critical
>
> On a cluster, we observed a crash with this stack trace:
> {noformat}
> #0  0x0000000001c79638 in impala::ScannerContext::Stream::Stream 
> (this=0x180d29b80, parent=0x18f77140, scan_range=0x1a0b81d40, 
> reservation=8388608, file_desc=0x0) at scanner-context.cc:86
> #1  0x0000000001c7b290 in impala::ScannerContext::AddStream 
> (this=this@entry=0x18f77140, range=0x1a0b81d40, reservation=8388608) at 
> scanner-context.cc:91
> #2  0x0000000001c2a0a0 in impala::HdfsScanNodeMt::GetNext (this=0x2e172000, 
> state=<optimized out>, row_batch=0x2a76cdc0, eos=0x39992b01) at 
> ../../../toolchain/toolchain-packages-gcc10.4.0/boost-1.74.0-p1/include/boost/smart_ptr/scoped_ptr.hpp:103
> #3  0x0000000001d02908 in impala::StreamingAggregationNode::GetRowsStreaming 
> (this=this@entry=0x39992900, state=state@entry=0x7a3c8000, 
> out_batch=out_batch@entry=0x7925fe00)
>     at 
> /grid/0/jenkins/workspace/workspace/CDWH-parallel-redhat8/SOURCES/impala_arm/toolchain/toolchain-packages-gcc10.4.0/gcc-10.4.0/include/c++/10.4.0/bits/unique_ptr.h:173
> #4  0x0000000001d034ac in impala::StreamingAggregationNode::GetNext 
> (this=0x39992900, state=0x7a3c8000, row_batch=0x7925fe00, eos=0xfffe427bff77) 
> at streaming-aggregation-node.cc:77
> #5  0x00000000014573f0 in impala::FragmentInstanceState::ExecInternal 
> (this=this@entry=0x1a00afd40) at 
> ../../../toolchain/toolchain-packages-gcc10.4.0/boost-1.74.0-p1/include/boost/smart_ptr/scoped_ptr.hpp:109
> #6  0x0000000001458ce0 in impala::FragmentInstanceState::Exec 
> (this=this@entry=0x1a00afd40) at fragment-instance-state.cc:104
> #7  0x00000000013ed280 in impala::QueryState::ExecFInstance (this=0x1c0ae000, 
> fis=0x1a00afd40) at query-state.cc:1013
> #8  0x0000000001b13998 in boost::function0<void>::operator() 
> (this=0xb9ce0890) at 
> ../../../toolchain/toolchain-packages-gcc10.4.0/boost-1.74.0-p1/include/boost/function/function_template.hpp:763
> #9  impala::Thread::SuperviseThread(std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, 
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> 
> > const&, boost::function<void ()> const&, impala::ThreadDebugInfo const*, 
> impala::Promise<long, (impala::PromiseMode)0>*) (name=..., category=..., 
> functor=..., parent_thread_info=<optimized out>, 
> thread_started=0xfffe3f5b9a30) at thread.cc:360
> #10 0x00000000023bbcb8 in boost::(anonymous namespace)::thread_proxy 
> (param=0xb9ce0700) at libs/thread/src/pthread/thread.cpp:179
> #11 0x0000ffffb97968b8 in start_thread () from /lib64/libpthread.so.0
> #12 0x0000ffffb78c1afc in removexattr () from /lib64/libc.so.6{noformat}
> It is suspicious that file_desc=0x0. This would indicate that 
> ScanRangeSharedState::GetFileDesc() would return null. It looks like that 
> could happen if we called it with a partition_id or filename that are not 
> part of the file_descs_. On a debug build, this would DCHECK, but on a 
> release build this would return null.
> This hasn't reproduced so far. We need to try to reproduce this and find the 
> issue. At the very least, this needs better diagnostics to have more 
> information if it happens again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to