Daniel Knapp
Tue, 24 Nov 2009 04:18:05 -0800
Hello, i'm trying to parse about 4GB of data. With the following code it always results in an JavaHeapSpace Error. I think there must be a better way to do this, but i don't know how. Has anybody a hint for me how to solve this problem? I think increasing the HeapSpace in Eclipse should not be the solution.
File dir = new File("/path/to/files/");
DirEdit d = new DirEdit();
d.listDir(dir);
ArrayList<String> ll = d.getList();
System.out.println(ll.size());
Iterator<String> it = ll.iterator();
Parser parser = new AutoDetectParser();
ParseContext context = new ParseContext();
context.set(Parser.class, parser);
StringWriter textBuffer = new StringWriter();
ContentHandler handler = new TeeContentHandler(
getTextContentHandler(textBuffer));
Metadata md = new Metadata();
while (it.hasNext()) {
System.out.println(it.next());
File file = new File(it.next());
FileInputStream input = new FileInputStream(file);
try {
parser.parse(input, handler, md, context);
System.out.println("Content-Type: " +
md.get("Content-Type") + "\n");
} catch (TikaException e) {
System.out.println("Error with File: "
+ file.getName() + " - "
+ e.getMessage());
} catch (IOException e) {
System.out.println("Error with File: "
+ file.getName() + " - "
+ e.getMessage());
}
finally {
input.close(); // note that you need to
explicitly close the stream
}
}
}
private static ContentHandler getTextContentHandler(Writer writer) {
return new BodyContentHandler(writer);
}
Regards,
Daniel
smime.p7s
Description: S/MIME cryptographic signature