[ 
https://issues.apache.org/jira/browse/TIKA-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17825384#comment-17825384
 ] 

Gregory Lepore commented on TIKA-4208:
--------------------------------------

I extracted all files from the ARC file and went through the sas7bdat files one 
by one. All processed correctly with the JSON option except the attached, which 
threw the above error.

 

Hopefully that will help others to figure out what's going on. Thanks!

> OOM error in SAS7BDATParser
> ---------------------------
>
>                 Key: TIKA-4208
>                 URL: https://issues.apache.org/jira/browse/TIKA-4208
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 3.0.0-BETA
>            Reporter: Gregory Lepore
>            Priority: Minor
>         Attachments: table23.sas7bdat.zip
>
>
> For this ARC file:
> [https://eotarchive.s3.amazonaws.com/crawl-data/EOT-2004/segments/NARA-000/warc/NARA-PEOT-2004-20041019023240-02598-crawling008-c_NARA-PEOT-2004-20041019053819-01693-crawling007.archive.org.arc.gz]
> I'm getting an OOM error:
> Exception in thread "main" java.lang.OutOfMemoryError: Requested array size 
> exceeds VM limit 
>        at java.base/java.util.Arrays.copyOf(Arrays.java:3537) 
>        at 
> java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:228)
>  
>        at 
> java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:740)
>  
>        at java.base/java.lang.StringBuffer.append(StringBuffer.java:410) 
>        at java.base/java.io.StringWriter.write(StringWriter.java:99) 
>        at 
> org.apache.tika.sax.ToTextContentHandler.characters(ToTextContentHandler.java:96)
>  
>        at 
> org.apache.tika.sax.ToXMLContentHandler.writeEscaped(ToXMLContentHandler.java:229)
>  
>        at 
> org.apache.tika.sax.ToXMLContentHandler.characters(ToXMLContentHandler.java:154)
>  
>        at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:143)
>  
>        at 
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:253)
>  
>        at 
> org.apache.tika.parser.RecursiveParserWrapper$RecursivelySecureContentHandler.characters(RecursiveParserWrapper.java:370)
>  
>        at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:143)
>  
>        at 
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:253)
>  
>        at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:143)
>  
>        at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:143)
>  
>        at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:143)
>  
>        at 
> org.apache.tika.sax.SafeContentHandler.access$101(SafeContentHandler.java:47) 
>        at 
> org.apache.tika.sax.SafeContentHandler.lambda$new$0(SafeContentHandler.java:57)
>  
>        at 
> org.apache.tika.sax.SafeContentHandler$$Lambda$327/0x00007f94a022d1a8.write(Unknown
>  Source) 
>        at 
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:106) 
>        at 
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:250)
>  
>        at 
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:270)
>  
>        at 
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:295)
>  
>        at 
> org.apache.tika.parser.sas.SAS7BDATParser.parse(SAS7BDATParser.java:146) 
>        at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) 
>        at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) 
>        at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:203) 
>        at 
> org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:153) 
>        at 
> org.apache.tika.parser.RecursiveParserWrapper$EmbeddedParserDecorator.parse(RecursiveParserWrapper.java:259)
>  
>        at 
> org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:71) 
>        at 
> org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:109)
>  
>        at 
> org.apache.tika.parser.pkg.PackageParser.parseEntry(PackageParser.java:455)
> when extracting JSON with both the app and server version of 3.0.0 BETA.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to