tika-user  

JavaHeapSpace - Parsing 4GB of data recursively

Daniel Knapp
Tue, 24 Nov 2009 04:18:05 -0800

Hello,

i'm trying to parse about 4GB of data. With the following code it always 
results in an JavaHeapSpace Error. I think there must be a better way to do 
this, but i don't know how.
Has anybody a hint for me how to solve this problem? I think increasing the 
HeapSpace in Eclipse should not be the solution.


                File dir = new File("/path/to/files/");
                DirEdit d = new DirEdit();
                d.listDir(dir);
                ArrayList<String> ll = d.getList();
                System.out.println(ll.size());
                Iterator<String> it = ll.iterator();

                
                Parser parser = new AutoDetectParser();
                ParseContext context = new ParseContext();
                context.set(Parser.class, parser);
                
        StringWriter textBuffer = new StringWriter();
        
        ContentHandler handler = new TeeContentHandler(
                getTextContentHandler(textBuffer));
        Metadata md = new Metadata();

                while (it.hasNext()) {
                         System.out.println(it.next());
                           File file = new File(it.next());

                           FileInputStream input = new FileInputStream(file);
                           try {
                                   parser.parse(input, handler, md, context);
                                   System.out.println("Content-Type: " + 
md.get("Content-Type") + "\n");
                            
                           } catch (TikaException e) {
                                        System.out.println("Error with File: " 
+ file.getName() + " - "
                                                        + e.getMessage());
                                } catch (IOException e) {
                                        System.out.println("Error with File: " 
+ file.getName() + " - "
                                                        + e.getMessage());
                                }
                           finally {
                               input.close(); // note that you need to 
explicitly close the stream
                           }
                }
                
        }
    private static ContentHandler getTextContentHandler(Writer writer) {
        return new BodyContentHandler(writer);
    }


Regards,
Daniel

Attachment: smime.p7s
Description: S/MIME cryptographic signature