Re: [Dspace-tech] [SPAM] Media filter handling of .docx

2014-04-22 Thread SUZUKI Keiji
Hi Trevor, I made it as a new filter that could extract text from .doc, .docx, ppt, pptx, xls and xlsx. See my commit log for detail at the following url: https://github.com/zuki/DSpace/commit/302de5d098cf5a3914498345a0e49ba56b796181 Regards, Keiji Suzuki Ebetsu, Japan 2014-04-16 23:05

[Dspace-tech] [SPAM] Media filter handling of .docx

2014-04-16 Thread Trevor Wilson
The media filter doesn't currently process .docx files to enable full-text search. I've found a few mentions of this being a result of it using outdated text-mining tools instead of Apache's POI for it's processing. Has anyone rewritten this to make use of POI and got it to full-text-search the