Wermter, Joachim
Fri, 13 Nov 2009 06:53:56 -0800
Hi, I'm using the UIMA Tika Annotator which calls Tika in the following way:
Parser.parse(originalStream, handler, md); originalStream is a BufferedInputStream. I've upgraded the Tika dependencies from 0.4 to 0.5-SNAPSHOT, and the problem I got now is that the InputStream is not properly UTF-8 decoded any more (e.g. German umlaut). Was there a change in the 0.5-SNAPSHOT which affects this? Best regards, Joachim Siemens AG Corporate Technology CT IC 1 Otto-Hahn-Ring 6 81739 München, Deutschland Tel.: +49 (89) 636-33647 Fax: +49 (89) 636-49438 mailto:joachim.werm...@siemens.com Siemens Aktiengesellschaft: Vorsitzender des Aufsichtsrats: Gerhard Cromme; Vorstand: Peter Löscher, Vorsitzender; Wolfgang Dehen, Heinrich Hiesinger, Joe Kaeser, Barbara Kux, Hermann Requardt, Siegfried Russwurm, Peter Y. Solmssen; Sitz der Gesellschaft: Berlin und München, Deutschland; Registergericht: Berlin Charlottenburg, HRB 12300, München, HRB 6684; WEEE-Reg.-Nr. DE 23691322