[ 
https://issues.apache.org/jira/browse/HDFS-9263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-9263:
--------------------------------
    Attachment: HDFS-9263-002.patch

[[email protected]], regarding the side discussion on HADOOP-11880, I have 
traced the problem to {{TestMiniDFSCluster}}, and the problem only occurs while 
running with the HDFS-9263 patch applied.  I hope you don't mind, but I'm 
attaching a v002 patch with a small modification to fix it.

My only change is in {{GenericTestUtils}}.  Prepare to smack forehead.  Here is 
the patch v001 code:

{code}
  public static final String DEFAULT_TEST_DATA_DIR =
      "target " + File.pathSeparator + "test" + File.pathSeparator + "data";
{code}

Here is my change in v002:

{code}
  public static final String DEFAULT_TEST_DATA_DIR =
      "target" + File.separator + "test" + File.separator + "data";
{code}

I removed the extra space character at the end of the "target" string literal, 
and I switched from {{File.pathSeparator}} (i.e. classpath separator, ':' on 
*nixes) to {{File.separator}} (i.e.file system path separator, '/' on *nixes).  
I constantly mix up those 2 myself.  I wish they had clearer names.

As to why {{TestMiniDFSCluster}} exposed this, one of the tests in that suite 
specifically removes the {{test.build.data}} property to check if the 
mini-cluster can still start using defaults.  After running that test suite, I 
could see it created the funny paths containing spaces and colons.  For code 
that passes the path through a {{URI}}, it would end up encoding the space to 
%20 too.

bq. If we not only consolidate test dir setup, but do it in a way that isolates 
it for each test suite, we get that isolation.

I'm on board with the consolidation aspect, but it's still unclear to me that 
there is a benefit of adding another random string into the path.  I suppose if 
the sub-directory was named to match the test suite, then that has some benefit 
for post-mortem analysis after a test failure.  You could go back and inspect 
metadata and blocks, and you'd know that you were looking at files specific to 
that test suite.

OTOH, this has the side effect of using many more directories, and they won't 
get cleaned up in between runs of different suites.  Typically, the data gets 
wiped between suite runs, either explicitly via {{FileUtil#fullyDelete}}, or 
implicitly via things like a NameNode format.  I tried a full test run of 
hadoop-hdfs, and then I saw this:

{code}
> du -hs ~/git/hadoop-common/hadoop-hdfs-project/hadoop-hdfs/target/test/data
6.1G    
/home/cnauroth/git/hadoop-common/hadoop-hdfs-project/hadoop-hdfs/target/test/data
{code}

That's more disk consumption than I'm used to seeing from a test run.  I'm 
pretty sure I'd need to reallocate volumes on some of my wimpier VMs to 
accommodate this.

> tests are using /test/build/data; breaking Jenkins
> --------------------------------------------------
>
>                 Key: HDFS-9263
>                 URL: https://issues.apache.org/jira/browse/HDFS-9263
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 3.0.0
>         Environment: Jenkins
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Blocker
>         Attachments: HDFS-9263-001.patch, HDFS-9263-002.patch
>
>
> Some of the HDFS tests are using the path {{test/build/data}} to store files, 
> so leaking files which fail the new post-build RAT test checks on Jenkins 
> (and dirtying all development systems with paths which {{mvn clean}} will 
> miss.
> fix



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to