[ 
https://issues.apache.org/jira/browse/IMPALA-7402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16586626#comment-16586626
 ] 

Tim Armstrong commented on IMPALA-7402:
---------------------------------------

I have a theory that the race is in this code in 
ScannerContext::Stream::GetNextBuffer()
{noformat}
    ScanRange* range = parent_->scan_node_->AllocateScanRange(
        scan_range_->fs(), filename(), read_past_buffer_size, offset, 
partition_id,
        scan_range_->disk_id(), false, BufferOpts::Uncached());
    bool needs_buffers;
    RETURN_IF_ERROR(
        parent_->scan_node_->reader_context()->StartScanRange(range, 
&needs_buffers));
    if (needs_buffers) {
      // Allocate fresh buffers. The buffers for 'scan_range_' should be 
released now
      // since we hit EOS.
      if (reservation_ < io_mgr->min_buffer_size()) {
        return Status(Substitute("Could not read past end of scan range in file 
'$0'. "
            "Reservation provided $1 was < the minimum I/O buffer size",
            reservation_, io_mgr->min_buffer_size()));
      }
      RETURN_IF_ERROR(io_mgr->AllocateBuffersForRange(
          parent_->bp_client_, range, reservation_));
    }
    RETURN_IF_ERROR(range->GetNext(&io_buffer_));
{noformat}

My theory involves two scanner threads A and B
1. thread A hits an error, in this case "Could only skip 0 header lines in 
first scan range but expected 2. Try increasing max_scan_range_length to a 
value larger than the size of the file's header."
2. thread A starts the scan range in ScannerContext::Stream::GetNextBuffer()
3. thread B calls reader_context_->Cancel() in SetDoneInternal(), which calls 
ScanRange::Cancel() on the allocated scan range.
4. thread A calls AllocateBuffersForRange() which calls 
ScanRange::AddUnusedBuffers()
5. thread A calls ScanRange::GetNext(), which notices the cancellation and 
returns CANCELLED
6. thread A propagates the error without calling ScanRange::Cancel() on that 
range to free the added buffers.

The bug is that 

> DCHECK failed min_bytes_to_write <= dirty_unpinned_pages_ in buffer-pool
> ------------------------------------------------------------------------
>
>                 Key: IMPALA-7402
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7402
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 3.1.0
>            Reporter: Vuk Ercegovac
>            Assignee: Tim Armstrong
>            Priority: Blocker
>              Labels: broken-build
>
> One of the impalad's crashed with the following DCHECK failure:
> {noformat}
> F0806 01:26:21.905500  5101 buffer-pool.cc:645] Check failed: 
> min_bytes_to_write <= dirty_unpinned_pages_.bytes() (8192 vs. 0)
> Here is the backtrace:{noformat}
> {noformat}
> #0 0x0000003af1e328e5 in raise () from /lib64/libc.so.6 
> #1 0x0000003af1e340c5 in abort () from /lib64/libc.so.6 
> #2 0x000000000437f454 in google::DumpStackTraceAndExit() () 
> #3 0x0000000004375ead in google::LogMessage::Fail() () 
> #4 0x0000000004377752 in google::LogMessage::SendToLog() () 
> #5 0x0000000004375887 in google::LogMessage::Flush() () 
> #6 0x0000000004378e4e in google::LogMessageFatal::~LogMessageFatal() () 
> #7 0x000000000205ad16 in impala::BufferPool::Client::WriteDirtyPagesAsync 
> (this=0x17d03f0e0, min_bytes_to_write=8192) at 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-centos6/repos/Impala/be/src/runtime/bufferpool/buffer-pool.cc:645
>  
> #8 0x000000000205a835 in impala::BufferPool::Client::CleanPages 
> (this=0x17d03f0e0, client_lock=0x7f324cb12220, len=8192) at 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-centos6/repos/Impala/be/src/runtime/bufferpool/buffer-pool.cc:625
>  
> #9 0x000000000205a646 in impala::BufferPool::Client::DecreaseReservationTo 
> (this=0x17d03f0e0, max_decrease=8192, target_bytes=8192) at 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-centos6/repos/Impala/be/src/runtime/bufferpool/buffer-pool.cc:609
>  
> #10 0x0000000002057583 in 
> impala::BufferPool::ClientHandle::DecreaseReservationTo (this=0x181c0a990, 
> max_decrease=8192, target_bytes=8192) at 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-centos6/repos/Impala/be/src/runtime/bufferpool/buffer-pool.cc:319
>  
> #11 0x00000000020d9419 in 
> impala::HdfsScanNode::ReturnReservationFromScannerThread (this=0x181c0a800, 
> lock=..., bytes=8192) at 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-centos6/repos/Impala/be/src/exec/hdfs-scan-node.cc:194
>  
> #12 0x00000000020da485 in impala::HdfsScanNode::ScannerThread 
> (this=0x181c0a800, first_thread=false, scanner_thread_reservation=8192) at 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-centos6/repos/Impala/be/src/exec/hdfs-scan-node.cc:367
>  
> #13 0x00000000020d96b0 in impala::HdfsScanNode::<lambda()>::operator()(void) 
> const (__closure=0x7f324cb12b88) at 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-centos6/repos/Impala/be/src/exec/hdfs-scan-node.cc:261
>  
> #14 0x00000000020db6d6 in 
> boost::detail::function::void_function_obj_invoker0<impala::HdfsScanNode::ThreadTokenAvailableCb(impala::ThreadResourcePool*)::<lambda()>,
>  void>::invoke(boost::detail::function::function_buffer &) 
> (function_obj_ptr=...) at 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-centos6/Impala-Toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:153
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to