[ https://issues.apache.org/jira/browse/TIKA-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139491#comment-17139491 ]
suchendra commented on TIKA-3097: --------------------------------- Sure.. and one more question does tika has any content handler that extracts only matched regex ? > Out of memory while parsing docx > -------------------------------- > > Key: TIKA-3097 > URL: https://issues.apache.org/jira/browse/TIKA-3097 > Project: Tika > Issue Type: Bug > Components: core, parser > Affects Versions: 1.24 > Reporter: suchendra > Priority: Major > Attachments: Screenshot from 2020-05-07 08-14-25.png, samplefile.txt, > test.docx > > > I have written simple Scala code to extract the content from uploaded file > which is docx. JVM goes OOM when tika tries to parse the file. I have > configured JVM heap to 1GB and tried with 2GB same issue occurs, issue both > with jar as well as in my code. > Attached the file for reference. -- This message was sent by Atlassian Jira (v8.3.4#803005)