Hi all,

I wanted to suggest the possibility of backporting client-side erasure
coding changes to branch-2, and get feedback on whether this is (1)
desirable, and (2) feasible without also backporting the server-side
changes.

Currently, client-side code to support erasure coding hasn't been
backported to branch-2, and as a result, both reads and writes of
erasure-coded data with 2.x clients fail:

* Running "hdfs dfs -get" on an erasure-coded file with a 2.9 client fails
with "java.io.IOException: Unexpected EOS from the reader" coming
from DFSInputStream.readWithStrategy(DFSInputStream.java:964).
* Writing to an erasure-coded directory via "hdfs dfs -put" with a Hadoop
2.9 client fails with a NotReplicatedYetException. (Writing the same file
to a directory that doesn't use erasure coding succeeds with the 2.9
client, and writing the file to the directory with erasure coding succeeds
using a 3.2 client.)

I think it's desirable to backport the client-side erasure coding support
to branch-2. Currently we have wire compatibility that allows 2.x clients
to run on 3.x clusters; however, these clients can't make use of one of the
most compelling features of Hadoop 3.

However, I don't know the code well enough to say whether it's possible to
backport the client-side changes without also pulling in the server-side
changes, at which point the scope of the backport increases dramatically.

I'm hoping people can weigh in on whether this is something we want to do,
and also on whether it's something we can do without backporting the
server-side changes as well.

If this a reasonable request, I'll file a JIRA for it.

Thanks,
Steve

Reply via email to