[
https://issues.apache.org/jira/browse/TIKA-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845586#comment-17845586
]
ASF GitHub Bot commented on TIKA-4254:
--------------------------------------
kaiyaok2 commented on PR #1754:
URL: https://github.com/apache/tika/pull/1754#issuecomment-2105675512
> The `repo` is refreshed with each unit test in the `@BeforeEach` call,
though. Is NIODetector respecting that?
@tballison Yes, NIOInspector uses the JUnit Jupiter engine and takes into
account of all setup and teardown methods. Notice that although the `MimeTypes`
instance `repo` is refreshed, `MimeTypes.addPattern()` calls `Patterns.add()`
,which then calls `addGlob()`:
```
private void addGlob(String glob, MimeType type) throws MimeTypeException {
MimeType previous = globs.get(glob);
if (previous == null ||
registry.isSpecializationOf(previous.getType(), type.getType())) {
globs.put(glob, type);
} else if (previous == type ||
registry.isSpecializationOf(type.getType(),
previous.getType())) {
// do nothing
} else {
throw new MimeTypeException("Conflicting glob pattern: " + glob);
}
}
```
In the second execution of the test, `previous` would be the `testType`
object constructed in the first test run, while `type` is the `testType` object
constructed in the second test run (from 2 different calls to `new
MimeType(MediaType.parse("foo/bar"))`. Now since `previous != type` are not the
same, the exception is thrown.
Ideally we shall go to the `// do nothing` branch in repeated runs, thus the
fix.
> The test `TestMimeTypes#testJavaRegex` is not idempotent, as it passes in the
> first run and fails in repeated runs in the same environment.
> --------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: TIKA-4254
> URL: https://issues.apache.org/jira/browse/TIKA-4254
> Project: Tika
> Issue Type: Bug
> Reporter: Kaiyao Ke
> Priority: Major
>
> ### Brief Description of the Bug
> The test `TestMimeTypes#testJavaRegex` is non-idempotent, as it passes in the
> first run but fails in the second run in the same environment. The source of
> the problem is that each test execution initializes a new media type
> (`MimeType`) instance `testType` (same problem for `testType2`), and all
> media types across different test executions attempt to use the same name
> pattern `"rtg_sst_grb_0\\.5\\.\\d{8}"`. Therefore, in the second execution of
> the test, the line `this.repo.addPattern(testType, pattern, true);` will
> throw an error, since the name pattern is already used by the `testType`
> instance initiated from the first test execution. Specifically, in the second
> run, the `addGlob()` method of the `Pattern` class will assert conflict
> patterns and throw a`MimeTypeException`(line 123 in `Patterns.java`).
> ### Failure Message in the 2nd Test Run:
> ```
> org.apache.tika.mime.MimeTypeException: Conflicting glob pattern:
> rtg_sst_grb_0\.5\.\d{8}
> at org.apache.tika.mime.Patterns.addGlob(Patterns.java:123)
> at org.apache.tika.mime.Patterns.add(Patterns.java:71)
> at org.apache.tika.mime.MimeTypes.addPattern(MimeTypes.java:450)
> at
> org.apache.tika.mime.TestMimeTypes.testJavaRegex(TestMimeTypes.java:851)
> at java.base/java.lang.reflect.Method.invoke(Method.java:568)
> at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
> at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
> ```
> ### Reproduce
> Use the `NIOInspector` plugin that supports rerunning individual tests in the
> same environment:
> ```
> cd tika-parsers/tika-parsers-standard/tika-parsers-standard-package
> mvn edu.illinois:NIOInspector:rerun
> -Dtest=org.apache.tika.mime.TestMimeTypes#testJavaRegex
> ```
> ### Proposed Fix
> Declare `testType` and `testType2` as static variables and initialize them at
> class loading time. Therefore, repeated runs of `testJavaRegex()` will not
> conflict each other. All tests pass and are idempotent after the fix.
> ### Necessity of Fix
> A fix is recommended as unit tests shall be idempotent, and state pollution
> shall be mitigated so that newly introduced tests do not fail in the future
> due to polluted shared states.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)