Hello,
An *OutOfMemoryError* occurs during *filter-media* execution when processing *~150 MB XLSX files*. It seems that the error occurs while the system is attempting to extract the text from the files. I have already enabled textextractor.use-temp-file = true and increased the Java (JVM) memory as shown below, but the issue persists. Environment="JAVA_OPTS=-Xmx12096M -Xms6024M -XX:MaxMetaspaceSize=2024M -Dfile.encoding=UTF-8” *dspace.cfg filter-media configurations below* #### Media Filter / Format Filter plugins (through PluginService) #### # Media/Format Filters help to full-text index content or # perform automated format conversions #Names of the enabled MediaFilter or FormatFilter plugins filter.plugins = Text Extractor filter.plugins = JPEG Thumbnail filter.plugins = PDFBox JPEG Thumbnail # [To enable Branded Preview]: uncomment and insert the following into the plugin list # Branded Preview JPEG, \ # [To enable ImageMagick Thumbnail]: # remove "JPEG Thumbnail" from the plugin list # uncomment and insert the following line into the plugin list # ImageMagick Image Thumbnail, ImageMagick PDF Thumbnail, \ # [To enable ImageMagick Video Thumbnails (requires both ImageMagick and ffmpeg installed)]: # uncomment and insert the following line into the plugin list # ImageMagick Video Thumbnail, \ # NOTE: pay attention to the ImageMagick policies and reource limits in its policy.xml # configuration file. The limits may have to be increased if a "cache resources # exhausted" error is thrown. #Assign 'human-understandable' names to each filter plugin.named.org.dspace.app.mediafilter.FormatFilter = org.dspace.app.mediafilter.TikaTextExtractionFilter = Text Extractor plugin.named.org.dspace.app.mediafilter.FormatFilter = org.dspace.app.mediafilter.JPEGFilter = JPEG Thumbnail plugin.named.org.dspace.app.mediafilter.FormatFilter = org.dspace.app.mediafilter.BrandedPreviewJPEGFilter = Branded Preview JPEG plugin.named.org.dspace.app.mediafilter.FormatFilter = org.dspace.app.mediafilter.PDFBoxThumbnail = PDFBox JPEG Thumbnail plugin.named.org.dspace.app.mediafilter.FormatFilter = org.dspace.app.mediafilter.ImageMagickImageThumbnailFilter = ImageMagick Image Thumbnail plugin.named.org.dspace.app.mediafilter.FormatFilter = org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter = ImageMagick PDF Thumbnail plugin.named.org.dspace.app.mediafilter.FormatFilter = org.dspace.app.mediafilter.ImageMagickVideoThumbnailFilter = ImageMagick Video Thumbnail #Configure each filter's input format(s) # NOTE: The TikaTextExtractionFilter can support any file formats that are supported by Apache Tika. So, you can easily # add additional formats to your DSpace Bitstream Format registry and list them here. The current list of Tika supported # formats is available at: https://tika.apache.org/2.3.0/formats.html filter.org.dspace.app.mediafilter.TikaTextExtractionFilter.inputFormats = Adobe PDF filter.org.dspace.app.mediafilter.TikaTextExtractionFilter.inputFormats = CSV filter.org.dspace.app.mediafilter.TikaTextExtractionFilter.inputFormats = HTML filter.org.dspace.app.mediafilter.TikaTextExtractionFilter.inputFormats = Microsoft Excel filter.org.dspace.app.mediafilter.TikaTextExtractionFilter.inputFormats = Microsoft Excel XML filter.org.dspace.app.mediafilter.TikaTextExtractionFilter.inputFormats = Microsoft Powerpoint filter.org.dspace.app.mediafilter.TikaTextExtractionFilter.inputFormats = Microsoft Powerpoint XML filter.org.dspace.app.mediafilter.TikaTextExtractionFilter.inputFormats = Microsoft Word filter.org.dspace.app.mediafilter.TikaTextExtractionFilter.inputFormats = Microsoft Word XML filter.org.dspace.app.mediafilter.TikaTextExtractionFilter.inputFormats = OpenDocument Presentation filter.org.dspace.app.mediafilter.TikaTextExtractionFilter.inputFormats = OpenDocument Spreadsheet filter.org.dspace.app.mediafilter.TikaTextExtractionFilter.inputFormats = OpenDocument Text filter.org.dspace.app.mediafilter.TikaTextExtractionFilter.inputFormats = RTF filter.org.dspace.app.mediafilter.TikaTextExtractionFilter.inputFormats = Text filter.org.dspace.app.mediafilter.JPEGFilter.inputFormats = BMP, GIF, JPEG, PNG filter.org.dspace.app.mediafilter.BrandedPreviewJPEGFilter.inputFormats = BMP, GIF, JPEG, PNG filter.org.dspace.app.mediafilter.ImageMagickImageThumbnailFilter.inputFormats = BMP, GIF, PNG, JPG, TIFF, JPEG, JPEG 2000 filter.org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter.inputFormats = Adobe PDF filter.org.dspace.app.mediafilter.ImageMagickVideoThumbnailFilter.inputFormats = Video MP4 filter.org.dspace.app.mediafilter.PDFBoxThumbnail.inputFormats = Adobe PDF #Publicly accessible thumbnails of restricted content. #List the MediaFilter name's that would get publicly accessible permissions #Any media filters not listed will instead inherit the permissions of the parent bitstream #filter.org.dspace.app.mediafilter.publicPermission = JPEGFilter I need help with this. Thanks in advance Manuela Klanovicz Ferreira -- All messages to this mailing list should adhere to the Code of Conduct: https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx --- You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/dspace-tech/19442e7a-9550-4c52-ae79-6a47e116afc1n%40googlegroups.com.
