[jira] [Updated] (HDFS-9486) Valgrind failures when using more than 1 io_service worker thread.

James Clampffer (JIRA) Wed, 02 Dec 2015 07:51:02 -0800

     [ 
https://issues.apache.org/jira/browse/HDFS-9486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


James Clampffer updated HDFS-9486:
----------------------------------
    Attachment: HDFS-9486-stacks-sanitized.txt

Attached a set of stacks to give a snapshot of what things look like right 
before the invalid read.  This was done with 5 asio worker threads and 128 
threads doing small reads (12 byte file).

This only happens during disconnect.  I think it's likely things getting 
destroyed in the wrong order in HadoopFileSystem's destructor (happened before 
and looked similar) or an object explicitly deleting a pointer that also 
happens to be held by a member smart_ptr in some other object.

It seems to be very timing dependent, at least on my machine.  It usually shows 
up the first time I run valgrind with a cold FS cache and then doesn't appear 
in subsequent runs.

> Valgrind failures when using more than 1 io_service worker thread.
> ------------------------------------------------------------------
>
>                 Key: HDFS-9486
>                 URL: https://issues.apache.org/jira/browse/HDFS-9486
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: James Clampffer
>            Assignee: James Clampffer
>         Attachments: HDFS-9486-stacks-sanitized.txt
>
>
> Valgrind catches an invalid read of size 8.  Setup: 4 io_service worker 
> threads, 64 threads doing open-read-close on a small file.
> Stack:
> ==8351== Invalid read of size 8
> ==8351==    at 0x51F45C: 
> asio::detail::reactive_socket_recv_op<asio::mutable_buffers_1, 
> asio::detail::read_op<asio::basic_stream_socket<asio::ip::tcp, 
> asio::stream_socket_service<asio::ip::tcp> >, asio::mutable_buffers_1, 
> asio::detail::transfer_all_t, std::_Bind<std::_Mem_fn<void 
> (hdfs::RpcConnectionImpl<asio::basic_stream_socket<asio::ip::tcp, 
> asio::stream_socket_service<asio::ip::tcp> > >::*)(std::error_code const&, 
> unsigned long)> 
> (hdfs::RpcConnectionImpl<asio::basic_stream_socket<asio::ip::tcp, 
> asio::stream_socket_service<asio::ip::tcp> > >*, std::_Placeholder<1>, 
> std::_Placeholder<2>)> > >::do_complete(asio::detail::task_io_service*, 
> asio::detail::task_io_service_operation*, std::error_code const&, unsigned 
> long) (functional:601)
> ==8351==    by 0x508B10: hdfs::IoServiceImpl::Run() 
> (task_io_service_operation.hpp:37)
> ==8351==    by 0x55BCBEF: ??? (in 
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19)
> ==8351==    by 0x5A2D181: start_thread (pthread_create.c:312)
> ==8351==    by 0x5D3D47C: clone (clone.S:111)
> ==8351==  Address 0x67e3eb0 is 0 bytes inside a block of size 216 free'd
> ==8351==    at 0x4C2C2BC: operator delete(void*) (in 
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==8351==    by 0x51F7B2: 
> hdfs::RpcConnectionImpl<asio::basic_stream_socket<asio::ip::tcp, 
> asio::stream_socket_service<asio::ip::tcp> > >::~RpcConnectionImpl() 
> (rpc_connection.h:32)
> ==8351==    by 0x50C104: hdfs::FileSystemImpl::~FileSystemImpl() 
> (unique_ptr.h:67)
> ==8351==    by 0x503A10: hdfs::HadoopFileSystem::~HadoopFileSystem() 
> (unique_ptr.h:67)
> ==8351==    by 0x503B28: hdfs::HadoopFileSystem::~HadoopFileSystem() 
> (hdfs_cpp.cc:140)
> ==8351==    by 0x503580: hdfs_internal::~hdfs_internal() (unique_ptr.h:67)
> ==8351==    by 0x502FEE: hdfsDisconnect (hdfs.cc:127)
> ==8351==    by 0x5010B7: main (threaded_stress_test.cc:74)
> ==8351== 
> pure virtual method called
> terminate called without an active exception



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9486) Valgrind failures when using more than 1 io_service worker thread.

Reply via email to