[ 
https://issues.apache.org/jira/browse/RAT-325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781683#comment-17781683
 ] 

Claude Warren commented on RAT-325:
-----------------------------------

That is what I would expect.  If you edit the configuration file 
/org/apache/rat/default.xml and remove the "not" clause on lines 22-24 I expect 
you will see a speed up.

This is the issue with "not" requiring that the block of input be processed 
until the end before being able to determine that the enclosed option is false. 
 In this case that the copyright does not exist. 

Since it has to process to the end and since the Copyright is a regex on every 
line it is expensive.

This is where I think the idea of specifying if a process is line or block 
oriented may make sense.  Though I think that all of the line oriented checks 
work on the block as well.

The code in the o.a.r.analysis.HeaderCheckWorker.readLine() reads each line and 
calls the matcher to see if it matches.  I think it would be much faster to 
modify HeaderCheckWorker.read() so that it reads the entire header block into a 
buffer first and then pass that buffer to the Matcher to see if it matches. 

This will ammortise the "Not", "regex" and "long text" costs.

Also, we should be able to simplify the "long text" checks as now each instance 
won't have to build the buffer itself and in the future we can probably convert 
it to a regex provided we do some work when we build the buffer to extract only 
comment code from the source files.  This would be a further optimization.

> Performance degradation compared to 0.15
> ----------------------------------------
>
>                 Key: RAT-325
>                 URL: https://issues.apache.org/jira/browse/RAT-325
>             Project: Apache Rat
>          Issue Type: Bug
>          Components: cli
>    Affects Versions: 0.16
>            Reporter: Jean-Baptiste Onofré
>            Priority: Major
>             Fix For: 0.16
>
>
> While testing 0.16-SNAPSHOT, I identified rat is much longer to execute than 
> with 0.15.
> I'm investigating why.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to