Hi,

I am new to Solr and tried to follow the guide to upload PDF data using Tika, on Solr 8.7.0 (running on Debian 10):

https://lucene.apache.org/solr/guide/8_7/uploading-data-with-solr-cell-using-apache-tika.html

but I get an HTTP 404 error when trying to import the file.


In the solr installation directory, after spinning up the example server using

solr/bin/solr -e schemaless

I firstly used the Post Tool to index a PDF file as described in the guide, giving the following output (paths truncated using “[…]” for privacy reasons):

bin/post -c gettingstarted example/exampledocs/solr-word.pdf -params "literal.id=doc1"

java -classpath /[…]/solr-8.7.0/dist/solr-core-8.7.0.jar -Dauto=yes -Dparams=literal.id=doc1 -Dc=gettingstarted -Ddata=files org.apa
che.solr.util.SimplePostTool example/exampledocs/solr-word.pdf
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/gettingstarted/update?literal.id=doc1... Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
POSTing file solr-word.pdf (application/pdf) to [base]/extract
SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url: http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1&r
esource.name=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf
SimplePostTool: WARNING: Response: <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 404 Not Found</title>
</head>
<body><h2>HTTP ERROR 404 Not Found</h2>
<table>
<tr><th>URI:</th><td>/solr/gettingstarted/update/extract</td></tr>
<tr><th>STATUS:</th><td>404</td></tr>
<tr><th>MESSAGE:</th><td>Not Found</td></tr>
<tr><th>SERVLET:</th><td>default</td></tr>
</table>

</body>
</html>
SimplePostTool: WARNING: IOException while reading response: java.io.FileNotFoundException: http://localhost:8983/solr/gettingstarted/update/extract ?literal.id=doc1&resource.name=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/gettingstarted/update?literal.id=doc1...
Time spent: 0:00:00.038
resulting in no actual changes being visible in the Solr.


Using curl results in the same HTTP response:

curl 'http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1&commit=true' -F "myfile=@example
/exampledocs/solr-word.pdf"
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 404 Not Found</title>
</head>
<body><h2>HTTP ERROR 404 Not Found</h2>
<table>
<tr><th>URI:</th><td>/solr/gettingstarted/update/extract</td></tr>
<tr><th>STATUS:</th><td>404</td></tr>
<tr><th>MESSAGE:</th><td>Not Found</td></tr>
<tr><th>SERVLET:</th><td>default</td></tr>
</table>

</body>
</html>


Sorry if this has already been discussed somewhere; I have not been able to find anything helpful yet.

Thank you!

Leon

Reply via email to