[
https://issues.apache.org/jira/browse/HDFS-9486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
James Clampffer updated HDFS-9486:
----------------------------------
Attachment: HDFS-9486-stacks-sanitized.txt
Attached a set of stacks to give a snapshot of what things look like right
before the invalid read. This was done with 5 asio worker threads and 128
threads doing small reads (12 byte file).
This only happens during disconnect. I think it's likely things getting
destroyed in the wrong order in HadoopFileSystem's destructor (happened before
and looked similar) or an object explicitly deleting a pointer that also
happens to be held by a member smart_ptr in some other object.
It seems to be very timing dependent, at least on my machine. It usually shows
up the first time I run valgrind with a cold FS cache and then doesn't appear
in subsequent runs.
> Valgrind failures when using more than 1 io_service worker thread.
> ------------------------------------------------------------------
>
> Key: HDFS-9486
> URL: https://issues.apache.org/jira/browse/HDFS-9486
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: hdfs-client
> Reporter: James Clampffer
> Assignee: James Clampffer
> Attachments: HDFS-9486-stacks-sanitized.txt
>
>
> Valgrind catches an invalid read of size 8. Setup: 4 io_service worker
> threads, 64 threads doing open-read-close on a small file.
> Stack:
> ==8351== Invalid read of size 8
> ==8351== at 0x51F45C:
> asio::detail::reactive_socket_recv_op<asio::mutable_buffers_1,
> asio::detail::read_op<asio::basic_stream_socket<asio::ip::tcp,
> asio::stream_socket_service<asio::ip::tcp> >, asio::mutable_buffers_1,
> asio::detail::transfer_all_t, std::_Bind<std::_Mem_fn<void
> (hdfs::RpcConnectionImpl<asio::basic_stream_socket<asio::ip::tcp,
> asio::stream_socket_service<asio::ip::tcp> > >::*)(std::error_code const&,
> unsigned long)>
> (hdfs::RpcConnectionImpl<asio::basic_stream_socket<asio::ip::tcp,
> asio::stream_socket_service<asio::ip::tcp> > >*, std::_Placeholder<1>,
> std::_Placeholder<2>)> > >::do_complete(asio::detail::task_io_service*,
> asio::detail::task_io_service_operation*, std::error_code const&, unsigned
> long) (functional:601)
> ==8351== by 0x508B10: hdfs::IoServiceImpl::Run()
> (task_io_service_operation.hpp:37)
> ==8351== by 0x55BCBEF: ??? (in
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19)
> ==8351== by 0x5A2D181: start_thread (pthread_create.c:312)
> ==8351== by 0x5D3D47C: clone (clone.S:111)
> ==8351== Address 0x67e3eb0 is 0 bytes inside a block of size 216 free'd
> ==8351== at 0x4C2C2BC: operator delete(void*) (in
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==8351== by 0x51F7B2:
> hdfs::RpcConnectionImpl<asio::basic_stream_socket<asio::ip::tcp,
> asio::stream_socket_service<asio::ip::tcp> > >::~RpcConnectionImpl()
> (rpc_connection.h:32)
> ==8351== by 0x50C104: hdfs::FileSystemImpl::~FileSystemImpl()
> (unique_ptr.h:67)
> ==8351== by 0x503A10: hdfs::HadoopFileSystem::~HadoopFileSystem()
> (unique_ptr.h:67)
> ==8351== by 0x503B28: hdfs::HadoopFileSystem::~HadoopFileSystem()
> (hdfs_cpp.cc:140)
> ==8351== by 0x503580: hdfs_internal::~hdfs_internal() (unique_ptr.h:67)
> ==8351== by 0x502FEE: hdfsDisconnect (hdfs.cc:127)
> ==8351== by 0x5010B7: main (threaded_stress_test.cc:74)
> ==8351==
> pure virtual method called
> terminate called without an active exception
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)