Dear Wiki user, You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.
The "cTAKESParser" page has been changed by ChrisMattmann: https://wiki.apache.org/tika/cTAKESParser?action=diff&rev1=8&rev2=9 = Setting up the Tika Config file = - You will need a custom Tika configuration file for the parser. You can find one [[here|https://raw.githubusercontent.com/chrismattmann/ctakesparser-utils/master/config/tika-config.xml]]. The reason is that since cTAKESParser decorates AutoDetectParser, in reality, cTAKESParser can handle *any* kind of file type that it can. But you have to make cTAKESParser intercept the mime types you want it to extract biomedical information from. So if you want Tika and its cTAKESParser to etxract biomedical information from application/pdf files, you will need this custom config and to add application/pdf as a mime that the parser can deal with. The default config provided looks like: + You will need a custom Tika configuration file for the parser. You can find one [[https://raw.githubusercontent.com/chrismattmann/ctakesparser-utils/master/config/tika-config.xml|here]]. The reason is that since cTAKESParser decorates AutoDetectParser, in reality, cTAKESParser can handle *any* kind of file type that it can. But you have to make cTAKESParser intercept the mime types you want it to extract biomedical information from. So if you want Tika and its cTAKESParser to etxract biomedical information from application/pdf files, you will need this custom config and to add application/pdf as a mime that the parser can deal with. The default config provided looks like: {{{ <?xml version="1.0" encoding="UTF-8" standalone="no"?>
