[ 
https://issues.apache.org/jira/browse/TIKA-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845560#comment-17845560
 ] 

ASF GitHub Bot commented on TIKA-4254:
--------------------------------------

kaiyaok2 opened a new pull request, #1754:
URL: https://github.com/apache/tika/pull/1754

   Fixes https://issues.apache.org/jira/projects/TIKA/issues/TIKA-4254
   
   ### Brief Description of the Bug
   
   The test `TestMimeTypes#testJavaRegex` is non-idempotent, as it passes in 
the first run but fails in the second run in the same environment. The source 
of the problem is that each test execution initializes a new media type 
(`MimeType`) instance `testType` (same problem for `testType2`), and all media 
types across different test executions attempt to use the same name pattern 
`"rtg_sst_grb_0\\.5\\.\\d{8}"`. Therefore, in the second execution of the test, 
the line `this.repo.addPattern(testType, pattern, true);` will throw an error, 
since the name pattern is already used by the `testType` instance initiated 
from the first test execution. Specifically, in the second run, the `addGlob()` 
method of the `Pattern` class will assert conflict patterns and throw 
a`MimeTypeException`(line 123 in `Patterns.java`).
   
   ### Failure Message in the 2nd Test Run:
   ```
   org.apache.tika.mime.MimeTypeException: Conflicting glob pattern: 
rtg_sst_grb_0\.5\.\d{8}
        at org.apache.tika.mime.Patterns.addGlob(Patterns.java:123)
        at org.apache.tika.mime.Patterns.add(Patterns.java:71)
        at org.apache.tika.mime.MimeTypes.addPattern(MimeTypes.java:450)
        at 
org.apache.tika.mime.TestMimeTypes.testJavaRegex(TestMimeTypes.java:851)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
   ```
   
   ### Reproduce
   
   Use the `NIOInspector` plugin that supports rerunning individual tests in 
the same environment:
   ```
   cd tika-parsers/tika-parsers-standard/tika-parsers-standard-package
   mvn edu.illinois:NIODetector:rerun 
-Dtest=org.apache.tika.mime.TestMimeTypes#testJavaRegex
   ```
   
   ### Proposed Fix
   
   Declare `testType` and `testType2` as static variables and initialize them 
at class loading time. Therefore, repeated runs of `testJavaRegex()` will not 
conflict each other. All tests pass and are idempotent after the fix.
   
   ### Necessity of Fix
   
   A fix is recommended as unit tests shall be idempotent, and state pollution 
shall be mitigated so that newly introduced tests do not fail in the future due 
to polluted shared states.
   
   
   




> The test `TestMimeTypes#testJavaRegex` is not idempotent, as it passes in the 
> first run and fails in repeated runs in the same environment. 
> --------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-4254
>                 URL: https://issues.apache.org/jira/browse/TIKA-4254
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Kaiyao Ke
>            Priority: Major
>
> ### Brief Description of the Bug
> The test `TestMimeTypes#testJavaRegex` is non-idempotent, as it passes in the 
> first run but fails in the second run in the same environment. The source of 
> the problem is that each test execution initializes a new media type 
> (`MimeType`) instance `testType` (same problem for `testType2`), and all 
> media types across different test executions attempt to use the same name 
> pattern `"rtg_sst_grb_0\\.5\\.\\d{8}"`. Therefore, in the second execution of 
> the test, the line `this.repo.addPattern(testType, pattern, true);` will 
> throw an error, since the name pattern is already used by the `testType` 
> instance initiated from the first test execution. Specifically, in the second 
> run, the `addGlob()` method of the `Pattern` class will assert conflict 
> patterns and throw a`MimeTypeException`(line 123 in `Patterns.java`).
> ### Failure Message in the 2nd Test Run:
> ```
> org.apache.tika.mime.MimeTypeException: Conflicting glob pattern: 
> rtg_sst_grb_0\.5\.\d{8}
>       at org.apache.tika.mime.Patterns.addGlob(Patterns.java:123)
>       at org.apache.tika.mime.Patterns.add(Patterns.java:71)
>       at org.apache.tika.mime.MimeTypes.addPattern(MimeTypes.java:450)
>       at 
> org.apache.tika.mime.TestMimeTypes.testJavaRegex(TestMimeTypes.java:851)
>       at java.base/java.lang.reflect.Method.invoke(Method.java:568)
>       at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
>       at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
> ```
> ### Reproduce
> Use the `NIOInspector` plugin that supports rerunning individual tests in the 
> same environment:
> ```
> cd tika-parsers/tika-parsers-standard/tika-parsers-standard-package
> mvn edu.illinois:NIODetector:rerun 
> -Dtest=org.apache.tika.mime.TestMimeTypes#testJavaRegex
> ```
> ### Proposed Fix
> Declare `testType` and `testType2` as static variables and initialize them at 
> class loading time. Therefore, repeated runs of `testJavaRegex()` will not 
> conflict each other. All tests pass and are idempotent after the fix.
> ### Necessity of Fix
> A fix is recommended as unit tests shall be idempotent, and state pollution 
> shall be mitigated so that newly introduced tests do not fail in the future 
> due to polluted shared states.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to