Hello,

An *OutOfMemoryError* occurs during *filter-media* execution when 
processing *~150 MB XLSX files*.  It seems that the error occurs while the 
system is attempting to extract the text from the files.

I have already enabled textextractor.use-temp-file = true and increased the 
Java (JVM) memory as shown below, but the issue persists.
Environment="JAVA_OPTS=-Xmx12096M -Xms6024M -XX:MaxMetaspaceSize=2024M 
-Dfile.encoding=UTF-8”


*dspace.cfg filter-media configurations below*
#### Media Filter / Format Filter plugins (through PluginService) ####
# Media/Format Filters help to full-text index content or
# perform automated format conversions

#Names of the enabled MediaFilter or FormatFilter plugins
filter.plugins = Text Extractor
filter.plugins = JPEG Thumbnail
filter.plugins = PDFBox JPEG Thumbnail


# [To enable Branded Preview]: uncomment and insert the following into the 
plugin list
#                Branded Preview JPEG, \

# [To enable ImageMagick Thumbnail]:
#    remove "JPEG Thumbnail" from the plugin list
#    uncomment and insert the following line into the plugin list
#                ImageMagick Image Thumbnail, ImageMagick PDF Thumbnail, \
# [To enable ImageMagick Video Thumbnails (requires both ImageMagick and 
ffmpeg installed)]:
#    uncomment and insert the following line into the plugin list
#                ImageMagick Video Thumbnail, \
#    NOTE: pay attention to the ImageMagick policies and reource limits in 
its policy.xml
#          configuration file. The limits may have to be increased if a 
"cache resources
#          exhausted" error is thrown.

#Assign 'human-understandable' names to each filter
plugin.named.org.dspace.app.mediafilter.FormatFilter = 
org.dspace.app.mediafilter.TikaTextExtractionFilter = Text Extractor
plugin.named.org.dspace.app.mediafilter.FormatFilter = 
org.dspace.app.mediafilter.JPEGFilter = JPEG Thumbnail
plugin.named.org.dspace.app.mediafilter.FormatFilter = 
org.dspace.app.mediafilter.BrandedPreviewJPEGFilter = Branded Preview JPEG
plugin.named.org.dspace.app.mediafilter.FormatFilter = 
org.dspace.app.mediafilter.PDFBoxThumbnail = PDFBox JPEG Thumbnail
plugin.named.org.dspace.app.mediafilter.FormatFilter = 
org.dspace.app.mediafilter.ImageMagickImageThumbnailFilter = ImageMagick 
Image Thumbnail
plugin.named.org.dspace.app.mediafilter.FormatFilter = 
org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter = ImageMagick PDF 
Thumbnail
plugin.named.org.dspace.app.mediafilter.FormatFilter = 
org.dspace.app.mediafilter.ImageMagickVideoThumbnailFilter = ImageMagick 
Video Thumbnail

#Configure each filter's input format(s)
# NOTE: The TikaTextExtractionFilter can support any file formats that are 
supported by Apache Tika. So, you can easily
# add additional formats to your DSpace Bitstream Format registry and list 
them here. The current list of Tika supported
# formats is available at: https://tika.apache.org/2.3.0/formats.html
filter.org.dspace.app.mediafilter.TikaTextExtractionFilter.inputFormats = 
Adobe PDF
filter.org.dspace.app.mediafilter.TikaTextExtractionFilter.inputFormats = 
CSV
filter.org.dspace.app.mediafilter.TikaTextExtractionFilter.inputFormats = 
HTML
filter.org.dspace.app.mediafilter.TikaTextExtractionFilter.inputFormats = 
Microsoft Excel
filter.org.dspace.app.mediafilter.TikaTextExtractionFilter.inputFormats = 
Microsoft Excel XML
filter.org.dspace.app.mediafilter.TikaTextExtractionFilter.inputFormats = 
Microsoft Powerpoint
filter.org.dspace.app.mediafilter.TikaTextExtractionFilter.inputFormats = 
Microsoft Powerpoint XML
filter.org.dspace.app.mediafilter.TikaTextExtractionFilter.inputFormats = 
Microsoft Word
filter.org.dspace.app.mediafilter.TikaTextExtractionFilter.inputFormats = 
Microsoft Word XML
filter.org.dspace.app.mediafilter.TikaTextExtractionFilter.inputFormats = 
OpenDocument Presentation
filter.org.dspace.app.mediafilter.TikaTextExtractionFilter.inputFormats = 
OpenDocument Spreadsheet
filter.org.dspace.app.mediafilter.TikaTextExtractionFilter.inputFormats = 
OpenDocument Text
filter.org.dspace.app.mediafilter.TikaTextExtractionFilter.inputFormats = 
RTF
filter.org.dspace.app.mediafilter.TikaTextExtractionFilter.inputFormats = 
Text
filter.org.dspace.app.mediafilter.JPEGFilter.inputFormats = BMP, GIF, JPEG, 
PNG
filter.org.dspace.app.mediafilter.BrandedPreviewJPEGFilter.inputFormats = 
BMP, GIF, JPEG, PNG
filter.org.dspace.app.mediafilter.ImageMagickImageThumbnailFilter.inputFormats 
= BMP, GIF, PNG, JPG, TIFF, JPEG, JPEG 2000
filter.org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter.inputFormats 
= Adobe PDF
filter.org.dspace.app.mediafilter.ImageMagickVideoThumbnailFilter.inputFormats 
= Video MP4
filter.org.dspace.app.mediafilter.PDFBoxThumbnail.inputFormats = Adobe PDF

#Publicly accessible thumbnails of restricted content.
#List the MediaFilter name's that would get publicly accessible permissions
#Any media filters not listed will instead inherit the permissions of the 
parent bitstream
#filter.org.dspace.app.mediafilter.publicPermission = JPEGFilter


I need help with this. 
Thanks in advance

Manuela Klanovicz Ferreira

-- 
All messages to this mailing list should adhere to the Code of Conduct: 
https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/dspace-tech/19442e7a-9550-4c52-ae79-6a47e116afc1n%40googlegroups.com.

Reply via email to