Re: 404 Errors on update/extract
Hi Leon, Feel free to create JIRA issue https://issues.apache.org/jira/secure/Dashboard.jspa and then do Github pull request to fix the example name. The documentation is in asciidoc format at: https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide/src with names matching those on the server. This could be a great issue to cut your teeth on with helping Solr :-) Regards, Alex. On Fri, 5 Feb 2021 at 10:35, nq wrote: > > Hi Alex, > > > Thanks a lot for your help! > > I have tested the same using the 'techproducts' example as proposed, and > it worked fine. > > > You are right, the documentation seems to be outdated in this aspect. > > I have just reviewed the solrconfig.xml of the 'schemaless' example and > found all the Solr Cell config was completely missing. > > After adding it as described at > > https://lucene.apache.org/solr/guide/8_8/uploading-data-with-solr-cell-using-apache-tika.html#configuring-the-extractingrequesthandler-in-solrconfig-xml > > everything worked fine again. > > > What can I do to help updating the docs? > > > Best regards, > > Leon > > > Am 05.02.21 um 16:15 schrieb Alexandre Rafalovitch: > > I think the extract handler is not defined in schemaless. This may be > > a change from before and the documentation is out of sync. > > > > Can you try 'techproducts' example instead of schemaless: > > bin/solr stop (if you are still running it) > > bin/solr start -e techproducts > > > > Then the import command. > > > > The Tika integration is defined in solrconfig.xml and needs both > > handler defined and some libraries loaded. Once you confirmed you like > > what you see, you can copy those into whatever configuration you are > > working with. > > > > Regards, > > Alex. > > > > On Fri, 5 Feb 2021 at 07:38, nq wrote: > >> Hi, > >> > >> > >> I am new to Solr and tried to follow the guide to upload PDF data using > >> Tika, on Solr 8.7.0 (running on Debian 10): > >> > >> https://lucene.apache.org/solr/guide/8_7/uploading-data-with-solr-cell-using-apache-tika.html > >> > >> but I get an HTTP 404 error when trying to import the file. > >> > >> > >> In the solr installation directory, after spinning up the example server > >> using > >> > >> solr/bin/solr -e schemaless > >> > >> I firstly used the Post Tool to index a PDF file as described in the > >> guide, giving the following output (paths truncated using “[…]” for > >> privacy reasons): > >> > >> bin/post -c gettingstarted example/exampledocs/solr-word.pdf -params > >> "literal.id=doc1" > >> > >>> java -classpath /[…]/solr-8.7.0/dist/solr-core-8.7.0.jar -Dauto=yes > >>> -Dparams=literal.id=doc1 -Dc=gettingstarted -Ddata=files org.apa > >>> che.solr.util.SimplePostTool example/exampledocs/solr-word.pdf > >>> SimplePostTool version 5.0.0 > >>> Posting files to [base] url > >>> http://localhost:8983/solr/gettingstarted/update?literal.id=doc1... > >>> Entering auto mode. File endings considered are > >>> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log > >>> POSTing file solr-word.pdf (application/pdf) to [base]/extract > >>> SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for > >>> url: > >>> http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1 > >>> esource.name=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf > >>> SimplePostTool: WARNING: Response: > >>> > >>> > >>> Error 404 Not Found > >>> > >>> HTTP ERROR 404 Not Found > >>> > >>> URI:/solr/gettingstarted/update/extract > >>> STATUS:404 > >>> MESSAGE:Not Found > >>> SERVLET:default > >>> > >>> > >>> > >>> > >>> SimplePostTool: WARNING: IOException while reading response: > >>> java.io.FileNotFoundException: > >>> http://localhost:8983/solr/gettingstarted/update/extract > >>> ?literal.id=doc1=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf > >>> > >>> 1 files indexed. > >>> COMMITting Solr index changes to > >>> http://localhost:8983/solr/gettingstarted/update?literal.id=doc1... > >>> Time spent: 0:00:00.038 > >> resulting in no actual changes being visible in the Solr. > >> > >> > >> Using curl results in the same HTTP response: > >> > >>> curl > >>> 'http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1=true' > >>> -F "myfile=@example > >>> /exampledocs/solr-word.pdf" > >>> > >>> > >>> > >>> Error 404 Not Found > >>> > >>> HTTP ERROR 404 Not Found > >>> > >>> URI:/solr/gettingstarted/update/extract > >>> STATUS:404 > >>> MESSAGE:Not Found > >>> SERVLET:default > >>> > >>> > >>> > >>> > >>> > >> Sorry if this has already been discussed somewhere; I have not been able > >> to find anything helpful yet. > >> > >> Thank you! > >> > >> Leon > >>
Re: 404 Errors on update/extract
Hi Alex, Thanks a lot for your help! I have tested the same using the 'techproducts' example as proposed, and it worked fine. You are right, the documentation seems to be outdated in this aspect. I have just reviewed the solrconfig.xml of the 'schemaless' example and found all the Solr Cell config was completely missing. After adding it as described at https://lucene.apache.org/solr/guide/8_8/uploading-data-with-solr-cell-using-apache-tika.html#configuring-the-extractingrequesthandler-in-solrconfig-xml everything worked fine again. What can I do to help updating the docs? Best regards, Leon Am 05.02.21 um 16:15 schrieb Alexandre Rafalovitch: I think the extract handler is not defined in schemaless. This may be a change from before and the documentation is out of sync. Can you try 'techproducts' example instead of schemaless: bin/solr stop (if you are still running it) bin/solr start -e techproducts Then the import command. The Tika integration is defined in solrconfig.xml and needs both handler defined and some libraries loaded. Once you confirmed you like what you see, you can copy those into whatever configuration you are working with. Regards, Alex. On Fri, 5 Feb 2021 at 07:38, nq wrote: Hi, I am new to Solr and tried to follow the guide to upload PDF data using Tika, on Solr 8.7.0 (running on Debian 10): https://lucene.apache.org/solr/guide/8_7/uploading-data-with-solr-cell-using-apache-tika.html but I get an HTTP 404 error when trying to import the file. In the solr installation directory, after spinning up the example server using solr/bin/solr -e schemaless I firstly used the Post Tool to index a PDF file as described in the guide, giving the following output (paths truncated using “[…]” for privacy reasons): bin/post -c gettingstarted example/exampledocs/solr-word.pdf -params "literal.id=doc1" java -classpath /[…]/solr-8.7.0/dist/solr-core-8.7.0.jar -Dauto=yes -Dparams=literal.id=doc1 -Dc=gettingstarted -Ddata=files org.apa che.solr.util.SimplePostTool example/exampledocs/solr-word.pdf SimplePostTool version 5.0.0 Posting files to [base] url http://localhost:8983/solr/gettingstarted/update?literal.id=doc1... Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log POSTing file solr-word.pdf (application/pdf) to [base]/extract SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url: http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1 esource.name=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf SimplePostTool: WARNING: Response: Error 404 Not Found HTTP ERROR 404 Not Found URI:/solr/gettingstarted/update/extract STATUS:404 MESSAGE:Not Found SERVLET:default SimplePostTool: WARNING: IOException while reading response: java.io.FileNotFoundException: http://localhost:8983/solr/gettingstarted/update/extract ?literal.id=doc1=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf 1 files indexed. COMMITting Solr index changes to http://localhost:8983/solr/gettingstarted/update?literal.id=doc1... Time spent: 0:00:00.038 resulting in no actual changes being visible in the Solr. Using curl results in the same HTTP response: curl 'http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1=true' -F "myfile=@example /exampledocs/solr-word.pdf" Error 404 Not Found HTTP ERROR 404 Not Found URI:/solr/gettingstarted/update/extract STATUS:404 MESSAGE:Not Found SERVLET:default Sorry if this has already been discussed somewhere; I have not been able to find anything helpful yet. Thank you! Leon
Re: 404 Errors on update/extract
I think the extract handler is not defined in schemaless. This may be a change from before and the documentation is out of sync. Can you try 'techproducts' example instead of schemaless: bin/solr stop (if you are still running it) bin/solr start -e techproducts Then the import command. The Tika integration is defined in solrconfig.xml and needs both handler defined and some libraries loaded. Once you confirmed you like what you see, you can copy those into whatever configuration you are working with. Regards, Alex. On Fri, 5 Feb 2021 at 07:38, nq wrote: > > Hi, > > > I am new to Solr and tried to follow the guide to upload PDF data using > Tika, on Solr 8.7.0 (running on Debian 10): > > https://lucene.apache.org/solr/guide/8_7/uploading-data-with-solr-cell-using-apache-tika.html > > but I get an HTTP 404 error when trying to import the file. > > > In the solr installation directory, after spinning up the example server > using > > solr/bin/solr -e schemaless > > I firstly used the Post Tool to index a PDF file as described in the > guide, giving the following output (paths truncated using “[…]” for > privacy reasons): > > bin/post -c gettingstarted example/exampledocs/solr-word.pdf -params > "literal.id=doc1" > > > java -classpath /[…]/solr-8.7.0/dist/solr-core-8.7.0.jar -Dauto=yes > > -Dparams=literal.id=doc1 -Dc=gettingstarted -Ddata=files org.apa > > che.solr.util.SimplePostTool example/exampledocs/solr-word.pdf > > SimplePostTool version 5.0.0 > > Posting files to [base] url > > http://localhost:8983/solr/gettingstarted/update?literal.id=doc1... > > Entering auto mode. File endings considered are > > xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log > > POSTing file solr-word.pdf (application/pdf) to [base]/extract > > SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for > > url: > > http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1 > > esource.name=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf > > SimplePostTool: WARNING: Response: > > > > > > Error 404 Not Found > > > > HTTP ERROR 404 Not Found > > > > URI:/solr/gettingstarted/update/extract > > STATUS:404 > > MESSAGE:Not Found > > SERVLET:default > > > > > > > > > > SimplePostTool: WARNING: IOException while reading response: > > java.io.FileNotFoundException: > > http://localhost:8983/solr/gettingstarted/update/extract > > ?literal.id=doc1=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf > > > > 1 files indexed. > > COMMITting Solr index changes to > > http://localhost:8983/solr/gettingstarted/update?literal.id=doc1... > > Time spent: 0:00:00.038 > resulting in no actual changes being visible in the Solr. > > > Using curl results in the same HTTP response: > > > curl > > 'http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1=true' > > -F "myfile=@example > > /exampledocs/solr-word.pdf" > > > > > > > > Error 404 Not Found > > > > HTTP ERROR 404 Not Found > > > > URI:/solr/gettingstarted/update/extract > > STATUS:404 > > MESSAGE:Not Found > > SERVLET:default > > > > > > > > > > > > Sorry if this has already been discussed somewhere; I have not been able > to find anything helpful yet. > > Thank you! > > Leon >
404 Errors on update/extract
Hi, I am new to Solr and tried to follow the guide to upload PDF data using Tika, on Solr 8.7.0 (running on Debian 10): https://lucene.apache.org/solr/guide/8_7/uploading-data-with-solr-cell-using-apache-tika.html but I get an HTTP 404 error when trying to import the file. In the solr installation directory, after spinning up the example server using solr/bin/solr -e schemaless I firstly used the Post Tool to index a PDF file as described in the guide, giving the following output (paths truncated using “[…]” for privacy reasons): bin/post -c gettingstarted example/exampledocs/solr-word.pdf -params "literal.id=doc1" java -classpath /[…]/solr-8.7.0/dist/solr-core-8.7.0.jar -Dauto=yes -Dparams=literal.id=doc1 -Dc=gettingstarted -Ddata=files org.apa che.solr.util.SimplePostTool example/exampledocs/solr-word.pdf SimplePostTool version 5.0.0 Posting files to [base] url http://localhost:8983/solr/gettingstarted/update?literal.id=doc1... Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log POSTing file solr-word.pdf (application/pdf) to [base]/extract SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url: http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1 esource.name=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf SimplePostTool: WARNING: Response: Error 404 Not Found HTTP ERROR 404 Not Found URI:/solr/gettingstarted/update/extract STATUS:404 MESSAGE:Not Found SERVLET:default SimplePostTool: WARNING: IOException while reading response: java.io.FileNotFoundException: http://localhost:8983/solr/gettingstarted/update/extract ?literal.id=doc1=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf 1 files indexed. COMMITting Solr index changes to http://localhost:8983/solr/gettingstarted/update?literal.id=doc1... Time spent: 0:00:00.038 resulting in no actual changes being visible in the Solr. Using curl results in the same HTTP response: curl 'http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1=true' -F "myfile=@example /exampledocs/solr-word.pdf" Error 404 Not Found HTTP ERROR 404 Not Found URI:/solr/gettingstarted/update/extract STATUS:404 MESSAGE:Not Found SERVLET:default Sorry if this has already been discussed somewhere; I have not been able to find anything helpful yet. Thank you! Leon