Kaiyao Ke created TIKA-4254:
-------------------------------
Summary: The test `TestMimeTypes#testJavaRegex` is not idempotent,
as it passes in the first run and fails in repeated runs in the same
environment.
Key: TIKA-4254
URL: https://issues.apache.org/jira/browse/TIKA-4254
Project: Tika
Issue Type: Bug
Reporter: Kaiyao Ke
### Brief Description of the Bug
The test `TestMimeTypes#testJavaRegex` is non-idempotent, as it passes in the
first run but fails in the second run in the same environment. The source of
the problem is that each test execution initializes a new media type
(`MimeType`) instance `testType` (same problem for `testType2`), and all media
types across different test executions attempt to use the same name pattern
`"rtg_sst_grb_0\\.5\\.\\d{8}"`. Therefore, in the second execution of the test,
the line `this.repo.addPattern(testType, pattern, true);` will throw an error,
since the name pattern is already used by the `testType` instance initiated
from the first test execution. Specifically, in the second run, the `addGlob()`
method of the `Pattern` class will assert conflict patterns and throw
a`MimeTypeException`(line 123 in `Patterns.java`).
### Failure Message in the 2nd Test Run:
```
org.apache.tika.mime.MimeTypeException: Conflicting glob pattern:
rtg_sst_grb_0\.5\.\d{8}
at org.apache.tika.mime.Patterns.addGlob(Patterns.java:123)
at org.apache.tika.mime.Patterns.add(Patterns.java:71)
at org.apache.tika.mime.MimeTypes.addPattern(MimeTypes.java:450)
at
org.apache.tika.mime.TestMimeTypes.testJavaRegex(TestMimeTypes.java:851)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
```
### Reproduce
Use the `NIOInspector` plugin that supports rerunning individual tests in the
same environment:
```
cd tika-parsers/tika-parsers-standard/tika-parsers-standard-package
mvn edu.illinois:NIODetector:rerun
-Dtest=org.apache.tika.mime.TestMimeTypes#testJavaRegex
```
### Proposed Fix
Declare `testType` and `testType2` as static variables and initialize them at
class loading time. Therefore, repeated runs of `testJavaRegex()` will not
conflict each other. All tests pass and are idempotent after the fix.
### Necessity of Fix
A fix is recommended as unit tests shall be idempotent, and state pollution
shall be mitigated so that newly introduced tests do not fail in the future due
to polluted shared states.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)