[ 
https://issues.apache.org/jira/browse/SOLR-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893328#action_12893328
 ] 

David Thibault commented on SOLR-1902:
--------------------------------------

OK, I tried Tommaso's patch and it worked great.  However, using the solr.war 
that came with the 1.4.1 distribution, it still gave the NoSuchMethodError.  
However, when I try the patched and newly-recompiled apache-solr-1.4.2-dev.war, 
it worked.  I thought I tried that before with the other patches and it didn't 
work.  In any case, it seems to be working with the following:
SOLR_DIST=the folder where the solr 1.4.1 distribution was unzipped.
SOLR_HOME=the folder where tomcat or jetty will look to load SOLR.

1) fresh copy of solr 1.4.1 distribution unzipped to SOLR_DIST

2) update SOLR_DIST/contrib/extraction/lib with the following:
   jempbox-1.2.1.jar
   fontbox-1.2.1.jar
   pdfbox-1.2.1.jar
   tika-core-0.8-SNAPSHOT.jar
   tika-parsers-0.8-SNAPSHOT.jar
  (and remove old tika and pdfbox-related jars)

3) patch with Tommaso's patch above in the SOLR_DIST directory:
patch -p0 < SOLR1902_patch_to_141.txt

4) still in SOLR_DIST, run "ant dist"

5) copy SOLR_DIST/dist/*.jar to SOLR_HOME/lib
6) copy SOLR_DIST/dist/solrj-lib to SOLR_HOME/lib/solrj-lib
7) copy SOLR_DIST/dist/apache-solr-1.4.2-dev.war to SOLR_HOME/
8) remove SOLR_HOME/contrib/extraction/lib/*.jar
9) copy SOLR_DIST/contrib/extraction/lib/*.jar to 
SOLR_HOME/contrib/extraction/lib/
10) (for me in tomcat) add CATALINA_HOME/conf/Catalina/localhost/solr.xml with 
the following content (substitute the actual directory for <SOLR_HOME> as that 
is not correct syntax):
<?xml version="1.0" encoding="utf-8"?>
  <Context docBase="<SOLR_HOME>\apache-solr-1.4.2-dev.war" debug="0" 
crossContext="true">
  <Environment name="solr/home" type="java.lang.String" value="<SOLR_HOME>" 
override="true"/>
</Context>
11) restart tomcat.
12) upload content via curl.
13) jump for joy when it doesn't crash on me again...=)

Best,
Dave 

> Tika no longer properly extracts content in Solr
> ------------------------------------------------
>
>                 Key: SOLR-1902
>                 URL: https://issues.apache.org/jira/browse/SOLR-1902
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - Solr Cell (Tika extraction)
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>             Fix For: 4.0
>
>         Attachments: SOLR1902_patch_to_141.txt
>
>
> See 
> http://www.lucidimagination.com/search/document/2ca3fe953038a54f/problem_with_pdf_upgrading_cell#22360c8261801f24
> It appears that since the upgrade to Tika 0.7, Tika is now selecting an 
> EmptyParser when uploading docs, which then outputs an empty XHTML 
> representation.  Still, it's strange that the tests pass.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to