Hi, I was wondering if the dev team would reconsider using the Hadoop 3.3.1 version for the next release version of Accumulo. I noticed that the hadoop dependency version was updated to 3.3.1 briefly by commit 3c3a91f7a4b6ea290a383a77844cabae34eaeb1f, but it was dropped back to 3.3.0 in commit 48679fef73e246de52fbeecad03f974f2116b97a shortly after. The explanation for undoing the change was that hadoop 3.3.1 was causing intermittent IT failures, most frequently in the CountNameNodeOpsBulkIT.
I checked out that commit myself and also noticed that the CountNameNodeOpsBulkIT was failing often with an IOException "Unable to close file because the last block...does not have enough number of replicas", which I don't believe is indicative of a bug in hadoop or the accumulo code. I think what's more likely is that the multithreaded test was overwhelming the minidfs cluster with requests. I'm not sure which default value/behavior was updated in hadoop 3.3.1 that would cause the minicluster to blow up where it wasn't previously in 3.3.0, but I noticed in later commits the issue was resolved. It looks like the ClientContext changes in the very next code change (commit 4b66b96b8f6c65c390fc26c11acf8c51cb78d858) resolve the IT failures that were the reason for moving the hadoop version back to 3.3.0. If you check out that or any later commit, and update the hadoop dependency version in the parent pom to 3.3.1, then the IT failures are resolved. The reason I'd like the hadoop 3.3.1 version to make it into the next release version of Accumulo is because I've been experimenting with Accumulo using S3 as the underlying file system. This change ( https://issues.apache.org/jira/browse/HADOOP-17597) that was added in hadoop 3.3.1 makes it possible to use the S3AFileSystem defined in hadoop-aws with Accumulo to replace HDFS with S3. The only change needed is to update the manager.walog.closer.implementation property and supply an S3LogCloser implementation on the classpath. Thanks, Chris