[
https://issues.apache.org/jira/browse/TIKA-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492522#comment-15492522
]
Tim Barrett commented on TIKA-2058:
-----------------------------------
private void processMsgEmbeddedInMsg(InformationGranule msgGranule, Path
resourceFilePath, ResourceSet parentResourceSet,
AttachmentChunks attachment) throws Throwable {
InputStream embeddedMsgFilePathInputStream = null;
OutputStream outStream = null;
POIFSFileSystem poifsFileSystem = null;
try {
MAPIMessage embeddedMAPIMessage =
attachment.getEmbeddedMessage();
poifsFileSystem = new POIFSFileSystem();
EntryUtils.copyNodes(attachment.attachmentDirectory.getDirectory(),
poifsFileSystem.getRoot());
Path targetDir =
FileSystems.getDefault().getPath(resourceFilePath.getParent().toString() +
"/attachments");
/*
* Creates directory if not already there
*/
try {
Files.createDirectory(targetDir);
} catch (IOException ignore) {
}
String embeddedMessageName = null;
try {
String conversationTopic =
embeddedMAPIMessage.getConversationTopic();
conversationTopic =
NalandaStringUtilities.stripSpecialCharactersFromString(conversationTopic);
embeddedMessageName = conversationTopic +
".msg";
} catch (ChunkNotFoundException cnfe) {
embeddedMessageName = this.messageNameCounter +
".msg";
this.messageNameCounter++;
}
if (embeddedMessageName != null) {
if (embeddedMessageName.length() > 200) {
logger.warn("Embedded attachment has
filename longer than 200 characters: " + embeddedMessageName);
StringBuilder strBldrEmbeddedFileName =
new StringBuilder();
strBldrEmbeddedFileName.append(UUID.randomUUID().toString());
strBldrEmbeddedFileName.append(".msg");
embeddedMessageName =
strBldrEmbeddedFileName.toString();
logger.warn("Embedded attachment has
filename with long name saved as " + embeddedMessageName);
}
File msgFileToWrite = new
File(targetDir.toString() + "/" + embeddedMessageName);
outStream = new
FileOutputStream(msgFileToWrite);
poifsFileSystem.writeFilesystem(outStream);
outStream.close();
Path embeddedMsgFilePath =
FileSystems.getDefault().getPath(msgFileToWrite.getPath());
embeddedMsgFilePathInputStream =
Files.newInputStream(embeddedMsgFilePath);
NalandaResourceHandler
attachmentResourceHandler = new NalandaResourceHandler(this.parentResourceSet,
this.jsonParseFailures,
this.jsonPasswordFailures, this.filesCouldNotParseList);
boolean isEmbeddedInMsg = true;
attachmentResourceHandler.processEmbeddedResource(msgGranule,
msgFileToWrite.getName(),
embeddedMsgFilePathInputStream,
parentResourceSet, embeddedMsgFilePath, null, null, null, isEmbeddedInMsg);
}
} catch (Throwable t) {
logger.warn("Exception occurred processing embedded
message in: " + msgGranule.getValue()
+ " embedded message has not been
processed", t);
} finally {
if (poifsFileSystem != null) {
// poifsFileSystem.close();
}
if (embeddedMsgFilePathInputStream != null) {
embeddedMsgFilePathInputStream.close();
}
if (outStream != null) {
outStream.close();
}
}
}
> Memory Leak in Tika version 1.13 when parsing millions of files
> ---------------------------------------------------------------
>
> Key: TIKA-2058
> URL: https://issues.apache.org/jira/browse/TIKA-2058
> Project: Tika
> Issue Type: Bug
> Affects Versions: 1.13
> Reporter: Tim Barrett
> Attachments: Yourkit screenshot.png, poi-3.15-beta1-p1.jar,
> poi-3.15-beta1-p1.pom, prevents-OOM-when-writable-is-false.patch,
> screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> We have an application using Tika which parses roughly 7,000,000 files of
> different types, many of the files are MSG files with attachments. This works
> correctly with Tika 1.9, and has been in production for over a year, with
> parsing runs taking place every few weeks. The same application runs into
> insufficient memory problems (java heap) when using Tika 1.13.
> I have used lsof and file leak detector to track down open files, however
> neither shows any open files when the application is running. I did find an
> issue with open files https://issues.apache.org/jira/browse/TIKA-2015,
> however there was a workaround for this and this is not the issue.
> I am sorry to have to report this with a level of vagueness, but with lsof
> turning nothing up I am a bit stuck as to how to investigate further. We are
> more than willing to help by testing on the basis of any ideas provided.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)