[
https://issues.apache.org/jira/browse/HDFS-9486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
James Clampffer updated HDFS-9486:
----------------------------------
Attachment: HDFS-9486.HDFS-8707.000.patch
Attached patch.
The issue here was stale code meant to prevent another type of bug that used to
be common and a lack of a comment mentioning this as a potential issue that got
lost over time. Here's what was going on:
1) FileSystemImpl's destructor would explicitly reset
FileSystemImpl::io_service_.
2) Then the NameNodeOperations member of FileSystem impl would be implicitly
destroyed, that had RpcEngine as a member that would in turn be destroyed.
3) The RpcEngine destructor would then attempt to do some work on the
underlying asio::io_service (it keeps a pointer) after it had been destroyed.
Fix is just removing the explicit unique_ptr reset and adding a comment says
that FileSystemImpl::io_service_ must always be the first declared member
variable so it is guaranteed to be the last destroyed member variable.
> Valgrind failures when using more than 1 io_service worker thread.
> ------------------------------------------------------------------
>
> Key: HDFS-9486
> URL: https://issues.apache.org/jira/browse/HDFS-9486
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: hdfs-client
> Reporter: James Clampffer
> Assignee: James Clampffer
> Attachments: HDFS-9486-stacks-sanitized.txt,
> HDFS-9486.HDFS-8707.000.patch
>
>
> Valgrind catches an invalid read of size 8. Setup: 4 io_service worker
> threads, 64 threads doing open-read-close on a small file.
> Stack:
> ==8351== Invalid read of size 8
> ==8351== at 0x51F45C:
> asio::detail::reactive_socket_recv_op<asio::mutable_buffers_1,
> asio::detail::read_op<asio::basic_stream_socket<asio::ip::tcp,
> asio::stream_socket_service<asio::ip::tcp> >, asio::mutable_buffers_1,
> asio::detail::transfer_all_t, std::_Bind<std::_Mem_fn<void
> (hdfs::RpcConnectionImpl<asio::basic_stream_socket<asio::ip::tcp,
> asio::stream_socket_service<asio::ip::tcp> > >::*)(std::error_code const&,
> unsigned long)>
> (hdfs::RpcConnectionImpl<asio::basic_stream_socket<asio::ip::tcp,
> asio::stream_socket_service<asio::ip::tcp> > >*, std::_Placeholder<1>,
> std::_Placeholder<2>)> > >::do_complete(asio::detail::task_io_service*,
> asio::detail::task_io_service_operation*, std::error_code const&, unsigned
> long) (functional:601)
> ==8351== by 0x508B10: hdfs::IoServiceImpl::Run()
> (task_io_service_operation.hpp:37)
> ==8351== by 0x55BCBEF: ??? (in
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19)
> ==8351== by 0x5A2D181: start_thread (pthread_create.c:312)
> ==8351== by 0x5D3D47C: clone (clone.S:111)
> ==8351== Address 0x67e3eb0 is 0 bytes inside a block of size 216 free'd
> ==8351== at 0x4C2C2BC: operator delete(void*) (in
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==8351== by 0x51F7B2:
> hdfs::RpcConnectionImpl<asio::basic_stream_socket<asio::ip::tcp,
> asio::stream_socket_service<asio::ip::tcp> > >::~RpcConnectionImpl()
> (rpc_connection.h:32)
> ==8351== by 0x50C104: hdfs::FileSystemImpl::~FileSystemImpl()
> (unique_ptr.h:67)
> ==8351== by 0x503A10: hdfs::HadoopFileSystem::~HadoopFileSystem()
> (unique_ptr.h:67)
> ==8351== by 0x503B28: hdfs::HadoopFileSystem::~HadoopFileSystem()
> (hdfs_cpp.cc:140)
> ==8351== by 0x503580: hdfs_internal::~hdfs_internal() (unique_ptr.h:67)
> ==8351== by 0x502FEE: hdfsDisconnect (hdfs.cc:127)
> ==8351== by 0x5010B7: main (threaded_stress_test.cc:74)
> ==8351==
> pure virtual method called
> terminate called without an active exception
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)