org.apache.hadoop.fs.FSDownload restricts tar archive extensions to .tar.gz, .tgz, and .tar Assuming the tar ends with one of these extensions, the inputStream is passed to FileUtil.untar. org.apache.hadoop.fs.FileUtil unTar calls unTarUsingTar for non windows systems, which then streams the input stream through stdin and essentially pipes it through the tar -x command. If its a .tar.gz or .tgz file, it is run through gzip -dc before being piped into tar -x. This is very restrictive given tar supports many different compression types. Specifically, we would like to add ZStandard compressed tar archives to our distributed cache. However these are not appropriately recognized as archives because the .tar.zst and .tzst extensions are not supported. This could be supported by either adding "zst -dc | " before tar like is done with GZip, or by running tar -x --zstd. With all the work that has been added to hadoop to support ZStandard since Hadoop 3.x, this seems like it would be a reasonable update. Would it be possible to add ZStandard support to distributed cache archives?
Distributed Cache Archives restricted to GZip Tars, no ZStandard
David McCauley via common-dev Fri, 08 May 2026 12:34:24 -0700
