Essentially, Rat is simple.
A source (perhaps a file system or a compressed archive) is walked,
producing documents. Each document (perhaps a file in a file system, or
a resources in an archive) flows through a pipeline - a series of
processing steps, enriching with various meta-data. An end point
collates the data.
It seems to me that the current code fails to express this
...
At the moment, IDocumentAnalyser[1] is implemented by most steps in the
pipeline (and other stuff too), wired together in a potentially flexible
fashion. This now seems over-engineered to me.
I think a concrete Pipeline would be more obvious, with controlled
extension points at each step of the processing.
Opinions...?
Objections...?
Robert
[1]
http://svn.apache.org/viewvc/creadur/rat/trunk/apache-rat-core/src/main/java/org/apache/rat/document/IDocumentAnalyser.java?view=markup