Jing Zhao created HDFS-16875:
--------------------------------

             Summary: Erasure Coding: data access proxy to allow old clients to 
read EC data
                 Key: HDFS-16875
                 URL: https://issues.apache.org/jira/browse/HDFS-16875
             Project: Hadoop HDFS
          Issue Type: New Feature
          Components: ec, erasure-coding
            Reporter: Jing Zhao
            Assignee: Jing Zhao


Erasure Coding is only supported by Hadoop 3, while many production deployments 
still depend on Hadoop 2. Upgrading the whole data tech stack to the Hadoop 3 
release may involve big migration efforts and even reliability risks, 
considering the incompatibilities between these two Hadoop major releases as 
well as the potential uncovered issues and risks hidden in newer releases. 
Therefore, we need to find a solution, with the least amount of migration 
effort and risk, to adopt Erasure Coding for cost efficiency but still allow 
HDFS clients with old versions (Hadoop 2.x) to access EC data in a transparent 
manner.

Internally we have developed an EC access proxy which translates the EC data 
for old clients. We also extend the NameNode RPC so it can recognize HDFS 
clients with/without the EC support, and redirect the old clients to the proxy. 
With the proxy we set up separate Erasure Coding clusters storing hundreds of 
PB of data, while leaving other production clusters and all the upper layer 
applications untouched.

Considering some changes are made at fundamental components of HDFS (e.g., 
client-NN RPC header), we do not aim to merge the change to trunk. We will use 
this ticket to share the design and implementation details (including the code) 
and collect feedback. We may use a separate github repo to open source the 
implementation later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to