[ https://issues.apache.org/jira/browse/SOLR-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977067#action_12977067 ]
Lance Norskog commented on SOLR-2116: ------------------------------------- Great! I'll try it out on 3.x and trunk. Speaking of Tika, have you ever seen a tikaconfig file? I can't find on anywhere on the web or the Tika source. > TikaEntityProcessor does not find parser by default > --------------------------------------------------- > > Key: SOLR-2116 > URL: https://issues.apache.org/jira/browse/SOLR-2116 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler, contrib - Solr Cell (Tika > extraction) > Affects Versions: 3.1, 4.0 > Reporter: Lance Norskog > Attachments: pdflist-data-config.xml, pdflist.xml, SOLR-2116.patch > > > The TikaEntityProcessor does not find the correct document parser by default. > This is in a two-level DIH config file. I have attached > pdflist-data-config.xml and pdflist.xml, the XML file list supplying. To test > this, you will need the current 3.x branch or 4.0 trunk. > # Set up a Tika-enabled Solr > # copy any PDF file to /tmp/testfile.pdf > # copy the pdflist-data-config.xml to your solr/conf > # and add this snippet to your solrconfig.xml > {code:xml} > <requestHandler name="/pdflist" > class="org.apache.solr.handler.dataimport.DataImportHandler"> > <lst name="defaults"> > <str name="config">pdflist-data-config.xml</str> > </lst> > </requestHandler> > {code} > [http://localhost:8983/solr/pdflist?command=full-import] will make one > document with the id and text fields populated. If you remove this line: > {code} > parser="org.apache.tika.parser.pdf.PDFParser" > {code} > from the TikaEntityProcessor entity, the parser will not be found and you > will get a document with the "id" field and nothing else. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org