Hi all, I wanted to suggest the possibility of backporting client-side erasure coding changes to branch-2, and get feedback on whether this is (1) desirable, and (2) feasible without also backporting the server-side changes.
Currently, client-side code to support erasure coding hasn't been backported to branch-2, and as a result, both reads and writes of erasure-coded data with 2.x clients fail: * Running "hdfs dfs -get" on an erasure-coded file with a 2.9 client fails with "java.io.IOException: Unexpected EOS from the reader" coming from DFSInputStream.readWithStrategy(DFSInputStream.java:964). * Writing to an erasure-coded directory via "hdfs dfs -put" with a Hadoop 2.9 client fails with a NotReplicatedYetException. (Writing the same file to a directory that doesn't use erasure coding succeeds with the 2.9 client, and writing the file to the directory with erasure coding succeeds using a 3.2 client.) I think it's desirable to backport the client-side erasure coding support to branch-2. Currently we have wire compatibility that allows 2.x clients to run on 3.x clusters; however, these clients can't make use of one of the most compelling features of Hadoop 3. However, I don't know the code well enough to say whether it's possible to backport the client-side changes without also pulling in the server-side changes, at which point the scope of the backport increases dramatically. I'm hoping people can weigh in on whether this is something we want to do, and also on whether it's something we can do without backporting the server-side changes as well. If this a reasonable request, I'll file a JIRA for it. Thanks, Steve