Multiple patternsets in a fileset

Bruce Atherton 17 Feb 2002 23:12:27 -0000

I've submitted what I considered (and still consider) a bug to Bugzilla (http://nagoya.apache.org/bugzilla/show_bug.cgi?id=6510) and was told that the existing behaviour was by design.

I'd like to challenge that design. To my mind, it is seriously broken, and saying "It's not a bug, it's a feature" is no answer.

The issue, for those that didn't follow the link, is that multiple overlapping PatternSets within a file set affect one another. So if you have:

  <fileset dir="a">
    <patternset>
      <include name="*.java" />
      <exclude name="test*" />
    </patternset>
    <patternset>
      <include name="*.class" />
      <exclude name="old*" />
    </patternset>
  </fileset>

Then test*.class and old*.java are excluded, despite the exclusions being in different patternsets.

Not only is this counterintuitive behaviour for the user, it takes away functionality for no good reason.

Consider the situation when you want to perform an exclusion on a directory tree, but you want one one subdirectory to be treated differently. This is not uncommon. You might want to exclude developer's test cases scattered throughout the codebase but include the end user tests stored in the "tests" directory. Or you may want to leave out all class files unless they are under the "dist" directory. Or you may want to get rid of CVS directories unless the subdirectory contains a separate project that yours depends on. I could go on and on.

Having separation in patternsets gives you this functionality. Then there is the question of what a user would expect. If you were encountering an ant build.xml file for the first time, would you expect old*.java to be excluded? Honestly?

My question is, what does mingling get you? I can't see that it gets you much. The ability to extend an existing pattern, perhaps. It would be ironic if that was the usecase held up, since the code goes to quite some length to stop referenced patterns from being extended this way.

Just for some background on why I'm bringing this up: as I mentioned to the list a couple of months ago, I think the best implementation of cullers is through making them elements of patternset. I've gone quite a ways down that path, and have a patch almost ready to submit for discussion. But in developing this, I've found that because of the mingling of patterns in a fileset, cullers would have to be applied across an entire fileset rather than to the specific files they were defined on. So if in your distribution you want all the log files that weren't larger than a Meg, and all the JAR files regardless of size, you couldn't do:

  <fileset dir="**">
    <patternset>
      <include name="*.log" />
      <sizeculler size="1000000" mustbe="less" />
    </patternset>
    <patternset>
      <include name="*.jar" />
    </patternset>
  </fileset>

because it would filter out jar files larger than a Meg too. This is obviously a made up example, but as you get more complicated, with cullers on date, on dependency data, on presence or lack thereof in another tree, on grepping, or on any of the other things people want cullers for, the mingling behaviour becomes disastrous. Even if you don't like this way of implementing cullers, the fact that any design along these lines is running afoul of the current mingling behaviour suggests strongly to me that latter is seriously borked.

That's how I see it, anyway. How about you?


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Multiple patternsets in a fileset

Reply via email to