On 07/10/13 23:49, Manuel Suárez Sánchez wrote:
1. scan the source, building a strongly-typed, immutable domain model
This point is basic to improve the project because now there aren´t a good
domain model and it´s very confused.
I think that the question comes down to granularity.
Here's one way that the two contrasting approach might work...
With the full model approach, the source would be scanned completed into
a model before the document contents were analysed. Once the analysis
was complete, then the reporting would start. The process flow would be
course-grained. This would cut across the grain of the current Rat design.
With a message oriented architecture, the scanner would send each
document to enrichment as soon as it was created. The enricher would
take a look at the contents and add document-level meta-data, then pass
on the enriched object as soon as it was created. Aggregate analysers
would then build up the report. This would be sympathetic to the current
Rat design.
Retaining a streaming/messaging architecture means modelling at the
message level (rather than more complete structures)
<snip>
However, I think that the current streaming design isn't particularly
intuitive or obvious. I would be happy to retain an improved streaming
design.
I think that apache rat is a release audit tool, focused on licenses. In
the project you analyse a file(audio) and you get the license of the file. Why
do you try to use streaming/message driven architecture?
Performance at small memory footprint
Robert