[
https://issues.apache.org/jira/browse/IMPALA-7402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16586626#comment-16586626
]
Tim Armstrong commented on IMPALA-7402:
---------------------------------------
I have a theory that the race is in this code in
ScannerContext::Stream::GetNextBuffer()
{noformat}
ScanRange* range = parent_->scan_node_->AllocateScanRange(
scan_range_->fs(), filename(), read_past_buffer_size, offset,
partition_id,
scan_range_->disk_id(), false, BufferOpts::Uncached());
bool needs_buffers;
RETURN_IF_ERROR(
parent_->scan_node_->reader_context()->StartScanRange(range,
&needs_buffers));
if (needs_buffers) {
// Allocate fresh buffers. The buffers for 'scan_range_' should be
released now
// since we hit EOS.
if (reservation_ < io_mgr->min_buffer_size()) {
return Status(Substitute("Could not read past end of scan range in file
'$0'. "
"Reservation provided $1 was < the minimum I/O buffer size",
reservation_, io_mgr->min_buffer_size()));
}
RETURN_IF_ERROR(io_mgr->AllocateBuffersForRange(
parent_->bp_client_, range, reservation_));
}
RETURN_IF_ERROR(range->GetNext(&io_buffer_));
{noformat}
My theory involves two scanner threads A and B
1. thread A hits an error, in this case "Could only skip 0 header lines in
first scan range but expected 2. Try increasing max_scan_range_length to a
value larger than the size of the file's header."
2. thread A starts the scan range in ScannerContext::Stream::GetNextBuffer()
3. thread B calls reader_context_->Cancel() in SetDoneInternal(), which calls
ScanRange::Cancel() on the allocated scan range.
4. thread A calls AllocateBuffersForRange() which calls
ScanRange::AddUnusedBuffers()
5. thread A calls ScanRange::GetNext(), which notices the cancellation and
returns CANCELLED
6. thread A propagates the error without calling ScanRange::Cancel() on that
range to free the added buffers.
The bug is that
> DCHECK failed min_bytes_to_write <= dirty_unpinned_pages_ in buffer-pool
> ------------------------------------------------------------------------
>
> Key: IMPALA-7402
> URL: https://issues.apache.org/jira/browse/IMPALA-7402
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 3.1.0
> Reporter: Vuk Ercegovac
> Assignee: Tim Armstrong
> Priority: Blocker
> Labels: broken-build
>
> One of the impalad's crashed with the following DCHECK failure:
> {noformat}
> F0806 01:26:21.905500 5101 buffer-pool.cc:645] Check failed:
> min_bytes_to_write <= dirty_unpinned_pages_.bytes() (8192 vs. 0)
> Here is the backtrace:{noformat}
> {noformat}
> #0 0x0000003af1e328e5 in raise () from /lib64/libc.so.6
> #1 0x0000003af1e340c5 in abort () from /lib64/libc.so.6
> #2 0x000000000437f454 in google::DumpStackTraceAndExit() ()
> #3 0x0000000004375ead in google::LogMessage::Fail() ()
> #4 0x0000000004377752 in google::LogMessage::SendToLog() ()
> #5 0x0000000004375887 in google::LogMessage::Flush() ()
> #6 0x0000000004378e4e in google::LogMessageFatal::~LogMessageFatal() ()
> #7 0x000000000205ad16 in impala::BufferPool::Client::WriteDirtyPagesAsync
> (this=0x17d03f0e0, min_bytes_to_write=8192) at
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-centos6/repos/Impala/be/src/runtime/bufferpool/buffer-pool.cc:645
>
> #8 0x000000000205a835 in impala::BufferPool::Client::CleanPages
> (this=0x17d03f0e0, client_lock=0x7f324cb12220, len=8192) at
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-centos6/repos/Impala/be/src/runtime/bufferpool/buffer-pool.cc:625
>
> #9 0x000000000205a646 in impala::BufferPool::Client::DecreaseReservationTo
> (this=0x17d03f0e0, max_decrease=8192, target_bytes=8192) at
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-centos6/repos/Impala/be/src/runtime/bufferpool/buffer-pool.cc:609
>
> #10 0x0000000002057583 in
> impala::BufferPool::ClientHandle::DecreaseReservationTo (this=0x181c0a990,
> max_decrease=8192, target_bytes=8192) at
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-centos6/repos/Impala/be/src/runtime/bufferpool/buffer-pool.cc:319
>
> #11 0x00000000020d9419 in
> impala::HdfsScanNode::ReturnReservationFromScannerThread (this=0x181c0a800,
> lock=..., bytes=8192) at
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-centos6/repos/Impala/be/src/exec/hdfs-scan-node.cc:194
>
> #12 0x00000000020da485 in impala::HdfsScanNode::ScannerThread
> (this=0x181c0a800, first_thread=false, scanner_thread_reservation=8192) at
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-centos6/repos/Impala/be/src/exec/hdfs-scan-node.cc:367
>
> #13 0x00000000020d96b0 in impala::HdfsScanNode::<lambda()>::operator()(void)
> const (__closure=0x7f324cb12b88) at
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-centos6/repos/Impala/be/src/exec/hdfs-scan-node.cc:261
>
> #14 0x00000000020db6d6 in
> boost::detail::function::void_function_obj_invoker0<impala::HdfsScanNode::ThreadTokenAvailableCb(impala::ThreadResourcePool*)::<lambda()>,
> void>::invoke(boost::detail::function::function_buffer &)
> (function_obj_ptr=...) at
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-centos6/Impala-Toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:153
> ...
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]