[
https://issues.apache.org/jira/browse/TIKA-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18070124#comment-18070124
]
ASF GitHub Bot commented on TIKA-4704:
--------------------------------------
Copilot commented on code in PR #2728:
URL: https://github.com/apache/tika/pull/2728#discussion_r3017899281
##########
tika-server/tika-server-core/src/test/java/org/apache/tika/server/core/CXFTestBase.java:
##########
@@ -152,7 +152,7 @@ protected static String
getStringFromInputStream(InputStream in) throws IOExcept
public static InputStream gzip(InputStream is) throws IOException {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
OutputStream gz = new GzipCompressorOutputStream(bos);
- IOUtils.copy(is, gz);
+ is.transferTo(gz);
gz.flush();
gz.close();
return new ByteArrayInputStream(bos.toByteArray());
Review Comment:
`gzip(InputStream)` creates a `GzipCompressorOutputStream` and closes it
only on the happy path. If `transferTo()` throws, the gzip stream won’t be
closed, which can leave the compressor in an inconsistent state and leak
resources. Use try-with-resources (or a finally) to guarantee the gzip stream
is closed even on exceptions.
##########
tika-server/tika-server-core/src/test/java/org/apache/tika/server/core/CXFTestBase.java:
##########
@@ -201,7 +202,7 @@ public void setUp() throws Exception {
// Initialize PipesParsingHelper for pipes-based parsing
// Merge the fetcher config with basePath pointing to the temp
directory
- this.pipesConfigPath = mergeFetcherConfig(this.pipesConfigPath,
inputTempDirectory);
+ this.pipesConfigPath = mergeFetcherConfig(tmpPipesConfigPath,
inputTempDirectory);
TikaJsonConfig tikaJsonConfig =
TikaJsonConfig.load(this.pipesConfigPath);
Review Comment:
In `setUp()`, `tmpPipesConfigPath` is always a temp file created by
`mergePassbackAllStrategy()`/`createDefaultTestConfig()`, but it is never
deleted after `mergeFetcherConfig(...)` writes a new temp config. This leaves
an intermediate temp file behind for each test run. Please delete
`tmpPipesConfigPath` once the merged config has been written (ideally in a
finally that won’t delete the final `pipesConfigPath`).
##########
tika-server/tika-server-core/src/test/java/org/apache/tika/server/core/CXFTestBase.java:
##########
@@ -268,6 +271,8 @@ private Path mergePassbackAllStrategy(InputStream
pipesConfigInputStream) throws
/**
* Merges the tika-server-fetcher configuration into the pipes config.
* The fetcher is configured with basePath pointing to the input temp
directory.
+ *
+ * @return new (different) JSON configPath in the temp director
Review Comment:
Javadoc typo: "temp director" should be "temp directory", and the `@return`
sentence should end with a period.
```suggestion
* @return new (different) JSON configPath in the temp directory.
```
##########
tika-server/tika-server-core/src/test/java/org/apache/tika/server/core/CXFTestBase.java:
##########
@@ -186,12 +186,13 @@ public void setUp() throws Exception {
Files.copy(getTikaConfigInputStream(), tmp,
StandardCopyOption.REPLACE_EXISTING);
InputStream pipesConfigInputStream = getPipesConfigInputStream();
+ Path tmpPipesConfigPath;
if (pipesConfigInputStream != null) {
// Test provided its own pipes config - merge in PASSBACK_ALL
emit strategy
- this.pipesConfigPath =
mergePassbackAllStrategy(pipesConfigInputStream);
+ tmpPipesConfigPath =
mergePassbackAllStrategy(pipesConfigInputStream);
} else {
// Create a default pipes config, merging metadata-filters
from tika config
- this.pipesConfigPath = createDefaultTestConfig(tmp);
+ tmpPipesConfigPath = createDefaultTestConfig(tmp);
}
Review Comment:
`pipesConfigInputStream` is obtained from `getPipesConfigInputStream()` and
passed into `mergePassbackAllStrategy(...)` without being closed. Even though
the current base implementation returns a `ByteArrayInputStream`, subclasses
can override this to return a stream that requires closing. Wrap
`pipesConfigInputStream` in try-with-resources when non-null.
##########
tika-server/tika-server-core/src/test/java/org/apache/tika/server/core/CXFTestBase.java:
##########
@@ -452,9 +459,8 @@ protected Map<String, byte[]>
readZipArchiveBytes(InputStream inputStream) throw
Enumeration<ZipArchiveEntry> entries = zip.getEntries();
while (entries.hasMoreElements()) {
ZipArchiveEntry entry = entries.nextElement();
- ByteArrayOutputStream bos = new ByteArrayOutputStream();
- IOUtils.copy(zip.getInputStream(entry), bos);
- data.put(entry.getName(), bos.toByteArray());
+ byte[] bytes = zip.getInputStream(entry).readAllBytes();
+ data.put(entry.getName(), bytes);
}
Review Comment:
In `readZipArchiveBytes(...)`, the per-entry stream returned by
`zip.getInputStream(entry)` is not closed. Wrap that stream in
try-with-resources before calling `readAllBytes()` to avoid leaking file
descriptors.
##########
tika-server/tika-server-core/src/test/java/org/apache/tika/server/core/CXFTestBase.java:
##########
@@ -429,9 +437,8 @@ protected Map<String, String> readZipArchive(InputStream
inputStream) throws IOE
Enumeration<ZipArchiveEntry> entries = zip.getEntries();
while (entries.hasMoreElements()) {
ZipArchiveEntry entry = entries.nextElement();
- ByteArrayOutputStream bos = new ByteArrayOutputStream();
- IOUtils.copy(zip.getInputStream(entry), bos);
- data.put(entry.getName(),
DigestUtils.md5Hex(bos.toByteArray()));
+ byte[] bytes = zip.getInputStream(entry).readAllBytes();
+ data.put(entry.getName(), DigestUtils.md5Hex(bytes));
}
Review Comment:
In `readZipArchive(...)`, the per-entry stream returned by
`zip.getInputStream(entry)` is not closed. This can leak file handles across
many entries. Wrap the entry stream in try-with-resources before calling
`readAllBytes()`.
> Avoid remaining temp files
> --------------------------
>
> Key: TIKA-4704
> URL: https://issues.apache.org/jira/browse/TIKA-4704
> Project: Tika
> Issue Type: Task
> Affects Versions: 3.3.0
> Reporter: Tilman Hausherr
> Priority: Minor
> Fix For: 4.0.0, 3.3.1
>
> Attachments: screenshot-1.png
>
>
> This is my temp directory after a successful build of tika 3. We should try
> to lessen this.
> !screenshot-1.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)