[ https://issues.apache.org/jira/browse/TIKA-3642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17474650#comment-17474650 ]
Tika User commented on TIKA-3642: --------------------------------- Tried using setMaxMainMemoryBytes still seeing memory issues. The same file tika 1.27 not seeing any memory issue or infinite loop issue. We are worried about infinite loop issue since it is problem if this issue occur in production that to we are seeing only after latest upgrade to 2.2.1. Can please suggest safe way to handle this infinite loop. Tried forkparser it is affecting our code many places , we usually pointing config.xml and using autodetector sending that config for forkparser we don’t have that option. Tried below code and alternative solutions are much appreciated at least to handle infinite loop from our side. // Init fork parser List<String> javaArgs = new ArrayList<String>(); forkParser = new ForkParser(); javaArgs.add("java"); javaArgs.add("-Xmx3048m"); // Specify maximum heap space for parsing documents forkParser.setJavaCommand(javaArgs); forkParser.setPoolSize(1); try (FileInputStream inputData = new FileInputStream(path)) { config = TikaConfigFactory.getTikaConfig(); Parser autoDetectParser = new AutoDetectParser(config); ParseContext context = new ParseContext(); context.set(TikaConfig.class, config); if (!largefile) { autoDetectParser.parse(inputData, handler, metadata, context); } else { forkParser.parse(inputData, handler, metadata, context); } } can we use forkparser same like autoDetectParser sending config to constructor. > Getting java.lang.OutOfMemoryError: Java heap space when parsing PDF file > ------------------------------------------------------------------------- > > Key: TIKA-3642 > URL: https://issues.apache.org/jira/browse/TIKA-3642 > Project: Tika > Issue Type: Bug > Reporter: Tika User > Priority: Major > > When parsing large PDF files(1.65 GB) we are getting out of memory error. The > version we are using 2.0.25(pdfbox) > java.lang.OutOfMemoryError: Java heap space at > org.apache.pdfbox.pdfparser.COSParser.isString -- This message was sent by Atlassian Jira (v8.20.1#820001)