Hi Trevor,
I made it as a new filter that could extract text from .doc, .docx,
ppt, pptx, xls and xlsx.
See my commit log for detail at the following url:
https://github.com/zuki/DSpace/commit/302de5d098cf5a3914498345a0e49ba56b796181
Regards,
Keiji Suzuki
Ebetsu, Japan
2014-04-16 23:05
The media filter doesn't currently process .docx files to enable full-text
search.
I've found a few mentions of this being a result of it using outdated
text-mining tools instead of Apache's POI for it's processing.
Has anyone rewritten this to make use of POI and got it to full-text-search
the
2 matches
Mail list logo