[
https://issues.apache.org/jira/browse/HDFS-9263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Nauroth updated HDFS-9263:
--------------------------------
Attachment: HDFS-9263-002.patch
[[email protected]], regarding the side discussion on HADOOP-11880, I have
traced the problem to {{TestMiniDFSCluster}}, and the problem only occurs while
running with the HDFS-9263 patch applied. I hope you don't mind, but I'm
attaching a v002 patch with a small modification to fix it.
My only change is in {{GenericTestUtils}}. Prepare to smack forehead. Here is
the patch v001 code:
{code}
public static final String DEFAULT_TEST_DATA_DIR =
"target " + File.pathSeparator + "test" + File.pathSeparator + "data";
{code}
Here is my change in v002:
{code}
public static final String DEFAULT_TEST_DATA_DIR =
"target" + File.separator + "test" + File.separator + "data";
{code}
I removed the extra space character at the end of the "target" string literal,
and I switched from {{File.pathSeparator}} (i.e. classpath separator, ':' on
*nixes) to {{File.separator}} (i.e.file system path separator, '/' on *nixes).
I constantly mix up those 2 myself. I wish they had clearer names.
As to why {{TestMiniDFSCluster}} exposed this, one of the tests in that suite
specifically removes the {{test.build.data}} property to check if the
mini-cluster can still start using defaults. After running that test suite, I
could see it created the funny paths containing spaces and colons. For code
that passes the path through a {{URI}}, it would end up encoding the space to
%20 too.
bq. If we not only consolidate test dir setup, but do it in a way that isolates
it for each test suite, we get that isolation.
I'm on board with the consolidation aspect, but it's still unclear to me that
there is a benefit of adding another random string into the path. I suppose if
the sub-directory was named to match the test suite, then that has some benefit
for post-mortem analysis after a test failure. You could go back and inspect
metadata and blocks, and you'd know that you were looking at files specific to
that test suite.
OTOH, this has the side effect of using many more directories, and they won't
get cleaned up in between runs of different suites. Typically, the data gets
wiped between suite runs, either explicitly via {{FileUtil#fullyDelete}}, or
implicitly via things like a NameNode format. I tried a full test run of
hadoop-hdfs, and then I saw this:
{code}
> du -hs ~/git/hadoop-common/hadoop-hdfs-project/hadoop-hdfs/target/test/data
6.1G
/home/cnauroth/git/hadoop-common/hadoop-hdfs-project/hadoop-hdfs/target/test/data
{code}
That's more disk consumption than I'm used to seeing from a test run. I'm
pretty sure I'd need to reallocate volumes on some of my wimpier VMs to
accommodate this.
> tests are using /test/build/data; breaking Jenkins
> --------------------------------------------------
>
> Key: HDFS-9263
> URL: https://issues.apache.org/jira/browse/HDFS-9263
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: test
> Affects Versions: 3.0.0
> Environment: Jenkins
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Blocker
> Attachments: HDFS-9263-001.patch, HDFS-9263-002.patch
>
>
> Some of the HDFS tests are using the path {{test/build/data}} to store files,
> so leaking files which fail the new post-build RAT test checks on Jenkins
> (and dirtying all development systems with paths which {{mvn clean}} will
> miss.
> fix
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)