Hi folks, if I am not wrong, currently you cannot configure a specific ContentHandler while using tika-server. I mean that you can configure your own parser [0] but you cannot control which ContentHandler the parser leverages to extract text and metadata (e.g., you cannot use PhoneExtractingContentHandler, StandardsExtractingContentHandler, etc). If it is correct, it would be nice to enable the use of specific ContentHandlers within tika-server and I would like to discuss how to solve this issue generally.
I propose two solutions: 1. augment the TikaConfig class so that a specific ContentHandler can be used in tika-config.xml; 2. determine the ContentHandler to use for parsing through HTTP headers, for example: curl -T filename.pdf http://localhost:9998/meta --header "X-Content-Handler: PhoneExtractingContentHandler" This should affect also the TikaResource.java class. I look forward to having your feedback. I strongly believe that every user who wants to use Tika as a service through tika-server and needs to extract content and metadata like phone numbers, standard references, etc would be very happy. Thanks a lot, Giuseppe
