Konstantin Avdeev created TIKA-1963:
---------------------------------------

             Summary: Configuring Parsers: "high degree of control over which 
parsers are or aren't used" does not work
                 Key: TIKA-1963
                 URL: https://issues.apache.org/jira/browse/TIKA-1963
             Project: Tika
          Issue Type: Bug
          Components: config
    Affects Versions: 1.12
         Environment: windows, java version "1.8.0_73", 64 bit
            Reporter: Konstantin Avdeev


Hi everybody!
I'm trying to white-list a particular mime-type for OCR with the following 
config:

{code}
<properties>
  <parsers>
    <parser class="org.apache.tika.parser.DefaultParser">
      <mime-exclude>application/pdf</mime-exclude>
      <parser-exclude class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
    </parser>
    <parser class="org.apache.tika.parser.pdf.PDFParser">
      <mime>application/pdf</mime>
    </parser>
  </parsers>
</properties>
{code}

So, the idea is - to enable the Tesseract parser for PDF format only.
But this configuration disables the Tesseract completely.
Is it the expected behaviour or a bug?
Thank you!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to