Hi,

I was wondering if the dev team would reconsider using the Hadoop 3.3.1
version for the next release version of Accumulo. I noticed that the hadoop
dependency version was updated to 3.3.1 briefly by
commit 3c3a91f7a4b6ea290a383a77844cabae34eaeb1f, but it was dropped back to
3.3.0 in commit 48679fef73e246de52fbeecad03f974f2116b97a shortly after.
The explanation for undoing the change was that hadoop 3.3.1 was causing
intermittent IT failures, most frequently in the CountNameNodeOpsBulkIT.

I checked out that commit myself and also noticed that the
CountNameNodeOpsBulkIT was failing often with an IOException "Unable to
close file because the last block...does not have enough number of
replicas", which I don't believe is indicative of a bug in hadoop or the
accumulo code. I think what's more likely is that the multithreaded test
was overwhelming the minidfs cluster with requests. I'm not sure which
default value/behavior was updated in hadoop 3.3.1 that would cause the
minicluster to blow up where it wasn't previously in 3.3.0, but I noticed
in later commits the issue was resolved. It looks like the ClientContext
changes in the very next code change (commit
4b66b96b8f6c65c390fc26c11acf8c51cb78d858) resolve the IT failures that were
the reason for moving the hadoop version back to 3.3.0. If you check out
that or any later commit, and update the hadoop dependency version in the
parent pom to 3.3.1, then the IT failures are resolved.

The reason I'd like the hadoop 3.3.1 version to make it into the next
release version of Accumulo is because I've been experimenting with
Accumulo using S3 as the underlying file system. This change (
https://issues.apache.org/jira/browse/HADOOP-17597) that was added in
hadoop 3.3.1 makes it possible to use the S3AFileSystem defined in
hadoop-aws with Accumulo to replace HDFS with S3. The only change needed is
to update the manager.walog.closer.implementation property and supply an
S3LogCloser implementation on the classpath.

Thanks,
Chris

Reply via email to