[
https://issues.apache.org/jira/browse/HDFS-11266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
James Clampffer updated HDFS-11266:
-----------------------------------
Status: Patch Available (was: Open)
> libhdfs++: Redesign block reader with with simplicity and resource management
> in mind
> -------------------------------------------------------------------------------------
>
> Key: HDFS-11266
> URL: https://issues.apache.org/jira/browse/HDFS-11266
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: hdfs-client
> Reporter: James Clampffer
> Assignee: James Clampffer
> Attachments: HDFS-11266.HDFS-8707.000.patch
>
>
> The goal here is to significantly simplify the block reader and make it much
> harder to introduce issues. There are plenty of examples of these issues in
> the subtasks of HDFS-8707, the one that finally motivated a reimplementation
> is HDFS-10931.
> Goals:
> -The read side protocol of the data transfer pipeline is fundamentally really
> simple (even if done asynchronously). The code should be equally simple.
> -Get the code in a state that should be easy enough to reason about with a
> solid understanding of HDFS and basic understanding of C++ and vice versa:
> improve comments and avoid using esoteric C++ constructs. This is a
> must-have in order to lower the bar to contribute.
> -Get rid of dependencies on the existing continuation stuff. Myself and
> others have spent far too much time debugging both the continuation code and
> bugs introduced because the continuation code was hard to reason about.
> Notable issues:
> -It's cool from a theoretical perspective, but after 18 months of working
> on this it's still unclear what problem the continuation pattern helped solve
> that callbacks couldn't.
> -They spend more time allocating memory than the rest of the code does
> doing real work - seriously, profile it. This can't be fixed because the
> Pipeline takes ownership of all Continuation objects and then deletes them.
> -The way the block reader really uses them is a hybrid of a state machine,
> continuations, and directly using asio callbacks to bounce between the two.
> Proposed approach:
> Still have a BlockReader class that owns a PacketReader class, the packet
> reader is analogous to the ReadPacketContinuation that the BlockReader builds
> now. The difference is that none of this will be stitched together at
> runtime using continuations, and once we have a block reader with a member
> packet reader that gets allocated up front. The PacketReader can be recycled
> in order to avoid allocations. The block reader is only responsible for
> requesting block info, after that it keeps invoking the PacketReader until
> enough data has been read.
> Async chaining:
> Move to a state machine based approach. This allows the readers to be pinned
> in memory, where each state is represented as a method. The asynchronous IO
> becomes the state transitions. A callback is supplied to the asio async call
> that jumps to the next state upon completion of the IO operation. Epsilon
> transitions will be fairly rare, but if we need them to temporarily drop a
> lock as is done in the RPC code io_service::post can be used rather than a
> call that actually does IO.
> I'm fairly confident in this approach since I used the same to implement
> various hardware async bus interfaces in VHDL to good effect i.e. high
> performance and easy to understand. An asio callback is roughly analogous to
> a signal in a sensitivity list as the methods are to process blocks.
> Example state machine that would send some stuff, then wait to get something
> back like what the current BlockReader::AsyncRequestBlock does using the
> approach described above.
> {code}
> class ExampleHandshake {
> // class would own any small buffers so they can be directly accessed
> public:
> void SendHandshake();
> private:
> void OnHandshakeSend();
> void OnHandShakeDone();
> asio::io_service service_;
> asio::ip::tcp::socket socket_;
> }
> void ExampleHandshake::SendHandshake() {
> // trampoline to jump into read state once write completes
> auto trampoline[this](asio::error_code ec, size_t sz) {
> //error checking here
> this->OnHandshakeSend();
> };
> asio::write(service_, socket_, asio buffer of data here, trampoline);
> }
> void ExampleHandshake::OnHandshakeSend() {
> // when read completes bounce into handler
> auto trampoline = [this](asio::error_code ec, size_t sz) {
> this->OnHandshakeDone();
> };
> asio::read(service_, socket_, asio buffer for received data, trampoline);
> }
> void ExampleHandshake::OnHandshakeDone() {
> //just finished sending request, and receiving response, go do something
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]