Fang-Yu Rao created IMPALA-13165:
------------------------------------

             Summary: Impala daemon crashed with OMException in Ozone build
                 Key: IMPALA-13165
                 URL: https://issues.apache.org/jira/browse/IMPALA-13165
             Project: IMPALA
          Issue Type: Bug
            Reporter: Fang-Yu Rao
            Assignee: Yida Wu


We found from an internal build that Impala daemon crashed with a lot of 
OMException in an Ozone build.

For instance, the backend test 
[Multi8RandomSpillToRemoteMix()|https://github.com/apache/impala/blob/master/be/src/runtime/bufferpool/buffer-pool-test.cc#L2065C24-L2070]
 failed with the following stack trace collected from the generated minidump.
{code}
Thread 502 (crashed)
 0  libc.so.6 + 0x36387
    rax = 0x0000000000000000   rdx = 0x0000000000000006
    rcx = 0xffffffffffffffff   rbx = 0x000000000607d920
    rsi = 0x0000000000000cfa   rdi = 0x00000000000028ec
    rbp = 0x00007fd6662f06e0   rsp = 0x00007fd6662f0428
     r8 = 0x0000000000000000    r9 = 0x00007fd6662f02e0
    r10 = 0x0000000000000008   r11 = 0x0000000000000202
    r12 = 0x000000000607d920   r13 = 0x000000000607d980
    r14 = 0x0000000000000152   r15 = 0x0000000000000223
    rip = 0x00007fd77dbd1387
    Found by: given as instruction pointer in context
 1  libc.so.6 + 0x37a78
    rbp = 0x00007fd6662f06e0   rsp = 0x00007fd6662f0430
    rip = 0x00007fd77dbd2a78
    Found by: stack scanning
 2  buffer-pool-test!google_breakpad::ExceptionHandler::HandleSignal(int, 
siginfo_t*, void*) + 0x1a0
    rbp = 0x00007fd6662f06e0   rsp = 0x00007fd6662f04b8
    rip = 0x0000000003a29e40
    Found by: stack scanning
 3  buffer-pool-test!tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, 
int, void* (*)(unsigned long)) + 0x68
    rbp = 0x00007fd6662f06e0   rsp = 0x00007fd6662f04f0
    rip = 0x0000000003b6f858
    Found by: stack scanning
 4  buffer-pool-test!tcmalloc::malloc_oom(unsigned long) + 0xc0
    rbp = 0x00007fd6662f06e0   rsp = 0x00007fd6662f0500
    rip = 0x0000000003d07f20
    Found by: stack scanning
 5  buffer-pool-test!google::(anonymous namespace)::FailureSignalHandler(int, 
siginfo_t*, void*) [clone .part.0] + 0xad0
    rbp = 0x00007fd6662f06e0   rsp = 0x00007fd6662f0558
    rip = 0x00000000039faa00
    Found by: stack scanning
 6  buffer-pool-test!google::DumpStackTraceAndExit() [clone .cold] + 0x5
    rbp = 0x00007fd6662f06e0   rsp = 0x00007fd6662f0560
    rip = 0x0000000000f00e4f
    Found by: stack scanning
 7  libstdc++.so.6 + 0x13aa48
    rbp = 0x00007fd6662f06e0   rsp = 0x00007fd6662f0570
    rip = 0x00007fd78132ea48
    Found by: stack scanning
 8  libstdc++.so.6 + 0x13aa48
    rbp = 0x00007fd6662f06e0   rsp = 0x00007fd6662f0580
    rip = 0x00007fd78132ea48
    Found by: stack scanning
 9  libstdc++.so.6 + 0x11f8e2
    rbp = 0x00007fd6662f06e0   rsp = 0x00007fd6662f05b0
    rip = 0x00007fd7813138e2
    Found by: stack scanning
10  
buffer-pool-test!google::LogDestination::WaitForSinks(google::LogMessage::LogMessageData*)
 + 0x110
    rbp = 0x00007fd6662f06e0   rsp = 0x00007fd6662f05e0
    rip = 0x00000000039f6460
    Found by: stack scanning
11  buffer-pool-test!google::LogMessage::Fail() + 0xd 
    rbp = 0x00007fd6662f06e0   rsp = 0x00007fd6662f0610
    rip = 0x00000000039ef6bd
    Found by: stack scanning
12  buffer-pool-test!google::LogMessage::SendToLog() + 0x244
    rbp = 0x00007fd6662f06e0   rsp = 0x00007fd6662f0620
    rip = 0x00000000039f15f4
    Found by: stack scanning
13  libstdc++.so.6 + 0x12cae4
    rbp = 0x00007fd6662f06e0   rsp = 0x00007fd6662f0640
    rip = 0x00007fd781320ae4
    Found by: stack scanning
14  buffer-pool-test!_fini + 0x19b3
    rbp = 0x00007fd6662f06e0   rsp = 0x00007fd6662f0648
    rip = 0x0000000003d0cb03
    Found by: stack scanning
15  buffer-pool-test!_fini + 0xa7c14
    rbp = 0x00007fd6662f06e0   rsp = 0x00007fd6662f0658
    rip = 0x0000000003db2d64
    Found by: stack scanning
16  buffer-pool-test!google::LogMessage::Flush() + 0x1ec
    rsp = 0x00007fd6662f06f0   rip = 0x00000000039ef09c
    Found by: stack scanning
17  libstdc++.so.6 + 0x12cae4
    rsp = 0x00007fd6662f0730   rip = 0x00007fd781320ae4
    Found by: stack scanning
18  buffer-pool-test!google::LogMessageFatal::~LogMessageFatal() + 0x9 
    rsp = 0x00007fd6662f0790   rip = 0x00000000039f1b19
    Found by: stack scanning
19  
buffer-pool-test!impala::BufferPoolTest::TestRandomInternalImpl(impala::BufferPool*,
 impala::TmpFileGroup*, impala::MemTracker*, 
std::mersenne_twister_engine<unsigned long, 32ul, 624ul, 397ul, 31ul, 
2567483615ul, 11ul, 4294967295ul, 7ul, 2636928640ul, 15ul, 4022730752ul, 18ul, 
1812433253ul>*, int, bool) [buffer-pool.h : 338 + 0x8]
    rsp = 0x00007fd6662f07a0   rip = 0x0000000000f8721f
    Found by: stack scanning
{code}

During the crash we also saw quite a few OMException from the console output.
{code}
08:46:11 
hdfsOpenFile(ofs://localhost:9862/impala/tmp/impala-scratch/a44cc3c871369491_8dcaa671747530a3_0000000000000000_0000000000000000/impala-scratch-ae339172-59d6-41ef-9a6a-249c4d9ff537):
 
FileSystem#create((Lorg/apache/hadoop/fs/Path;ZISJ)Lorg/apache/hadoop/fs/FSDataOutputStream;)hdfsOpenFile(ofs://localhost:9862/impala/tmp/impala-scratch/a44cc3c871369491_8dcaa671747530a3_0000000000000000_0000000000000000/impala-scratch-b305d4f9-8e61-4b96-afd4-9940bd8f48b6):
 
FileSystem#create((Lorg/apache/hadoop/fs/Path;ZISJ)Lorg/apache/hadoop/fs/FSDataOutputStream;)
 error:
08:46:11 
hdfsOpenFile(ofs://localhost:9862/impala/tmp/impala-scratch/a44cc3c871369491_8dcaa671747530a3_0000000000000000_0000000000000000/impala-scratch-45a6a781-55f7-44aa-9a06-2ed6a6242e92):
 
FileSystem#create((Lorg/apache/hadoop/fs/Path;ZISJ)Lorg/apache/hadoop/fs/FSDataOutputStream;)hdfsOpenFile(ofs://localhost:9862/impala/tmp/impala-scratch/a44cc3c871369491_8dcaa671747530a3_0000000000000000_0000000000000000/impala-scratch-0f52e72b-2cc0-4697-a1aa-838891422844):
 
FileSystem#create((Lorg/apache/hadoop/fs/Path;ZISJ)Lorg/apache/hadoop/fs/FSDataOutputStream;)hdfsOpenFile(ofs://localhost:9862/impala/tmp/impala-scratch/a44cc3c871369491_8dcaa671747530a3_0000000000000000_0000000000000000/impala-scratch-4d43c8b9-e062-474a-945c-9e20e4d50998):
 
FileSystem#create((Lorg/apache/hadoop/fs/Path;ZISJ)Lorg/apache/hadoop/fs/FSDataOutputStream;)
 error:
08:46:11 
hdfsOpenFile(ofs://localhost:9862/impala/tmp/impala-scratch/a44cc3c871369491_8dcaa671747530a3_0000000000000000_0000000000000000/impala-scratch-40a6653e-3b88-4c62-8d4e-1f23e2e5eb9c):
 
FileSystem#create((Lorg/apache/hadoop/fs/Path;ZISJ)Lorg/apache/hadoop/fs/FSDataOutputStream;)
 error:
08:46:11  error:
08:46:11  error:
08:46:11 OMException: Allocated 0 blocks. Requested 1 blocksINTERNAL_ERROR 
org.apache.hadoop.ozone.om.exceptions.OMException: Allocated 0 blocks. 
Requested 1 blocks
08:46:11        at 
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:728)
08:46:11        at 
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.createFile(OzoneManagerProtocolClientSideTranslatorPB.java:2133)
08:46:11        at 
org.apache.hadoop.ozone.client.rpc.RpcClient.createFile(RpcClient.java:2001)
08:46:11        at 
org.apache.hadoop.ozone.client.OzoneBucket.createFile(OzoneBucket.java:822)
08:46:11        at 
org.apache.hadoop.fs.ozone.BasicRootedOzoneClientAdapterImpl.createFile(BasicRootedOzoneClientAdapterImpl.java:389)
08:46:11        at 
org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.createOutputStream(BasicRootedOzoneFileSystem.java:299)
08:46:11        at 
org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.create(BasicRootedOzoneFileSystem.java:261)
08:46:11        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1177)
08:46:11        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1157)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to