Re: 404 Errors on update/extract

2021-02-05 Thread Alexandre Rafalovitch
Hi Leon,

Feel free to create JIRA issue
https://issues.apache.org/jira/secure/Dashboard.jspa
and then do Github pull request to fix the example name.  The
documentation is in asciidoc format at:
https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide/src
with names matching those on the server.

This could be a great issue to cut your teeth on with helping Solr :-)

Regards,
   Alex.

On Fri, 5 Feb 2021 at 10:35, nq  wrote:
>
> Hi Alex,
>
>
> Thanks a lot for your help!
>
> I have tested the same using the 'techproducts' example as proposed, and
> it worked fine.
>
>
> You are right, the documentation seems to be outdated in this aspect.
>
> I have just reviewed the solrconfig.xml of the 'schemaless' example and
> found all the Solr Cell config was completely missing.
>
> After adding it as described at
>
> https://lucene.apache.org/solr/guide/8_8/uploading-data-with-solr-cell-using-apache-tika.html#configuring-the-extractingrequesthandler-in-solrconfig-xml
>
> everything worked fine again.
>
>
> What can I do to help updating the docs?
>
>
> Best regards,
>
> Leon
>
>
> Am 05.02.21 um 16:15 schrieb Alexandre Rafalovitch:
> > I think the extract handler is not defined in schemaless. This may be
> > a change from before and the documentation is out of sync.
> >
> > Can you try 'techproducts' example instead of schemaless:
> > bin/solr stop (if you are still running it)
> > bin/solr start -e techproducts
> >
> > Then the import command.
> >
> > The Tika integration is defined in solrconfig.xml and needs both
> > handler defined and some libraries loaded. Once you confirmed you like
> > what you see, you can copy those into whatever configuration you are
> > working with.
> >
> > Regards,
> > Alex.
> >
> > On Fri, 5 Feb 2021 at 07:38, nq  wrote:
> >> Hi,
> >>
> >>
> >> I am new to Solr and tried to follow the guide to upload PDF data using
> >> Tika, on Solr 8.7.0 (running on Debian 10):
> >>
> >> https://lucene.apache.org/solr/guide/8_7/uploading-data-with-solr-cell-using-apache-tika.html
> >>
> >> but I get an HTTP 404 error when trying to import the file.
> >>
> >>
> >> In the solr installation directory, after spinning up the example server
> >> using
> >>
> >> solr/bin/solr -e schemaless
> >>
> >> I firstly used the Post Tool to index a PDF file as described in the
> >> guide, giving the following output (paths truncated using “[…]” for
> >> privacy reasons):
> >>
> >> bin/post -c gettingstarted example/exampledocs/solr-word.pdf -params
> >> "literal.id=doc1"
> >>
> >>> java -classpath /[…]/solr-8.7.0/dist/solr-core-8.7.0.jar -Dauto=yes
> >>> -Dparams=literal.id=doc1 -Dc=gettingstarted -Ddata=files org.apa
> >>> che.solr.util.SimplePostTool example/exampledocs/solr-word.pdf
> >>> SimplePostTool version 5.0.0
> >>> Posting files to [base] url
> >>> http://localhost:8983/solr/gettingstarted/update?literal.id=doc1...
> >>> Entering auto mode. File endings considered are
> >>> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
> >>> POSTing file solr-word.pdf (application/pdf) to [base]/extract
> >>> SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for
> >>> url:
> >>> http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1
> >>> esource.name=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf
> >>> SimplePostTool: WARNING: Response: 
> >>> 
> >>> 
> >>> Error 404 Not Found
> >>> 
> >>> HTTP ERROR 404 Not Found
> >>> 
> >>> URI:/solr/gettingstarted/update/extract
> >>> STATUS:404
> >>> MESSAGE:Not Found
> >>> SERVLET:default
> >>> 
> >>>
> >>> 
> >>> 
> >>> SimplePostTool: WARNING: IOException while reading response:
> >>> java.io.FileNotFoundException:
> >>> http://localhost:8983/solr/gettingstarted/update/extract
> >>> ?literal.id=doc1=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf
> >>>
> >>> 1 files indexed.
> >>> COMMITting Solr index changes to
> >>> http://localhost:8983/solr/gettingstarted/update?literal.id=doc1...
> >>> Time spent: 0:00:00.038
> >> resulting in no actual changes being visible in the Solr.
> >>
> >>
> >> Using curl results in the same HTTP response:
> >>
> >>> curl
> >>> 'http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1=true'
> >>> -F "myfile=@example
> >>> /exampledocs/solr-word.pdf"
> >>> 
> >>> 
> >>> 
> >>> Error 404 Not Found
> >>> 
> >>> HTTP ERROR 404 Not Found
> >>> 
> >>> URI:/solr/gettingstarted/update/extract
> >>> STATUS:404
> >>> MESSAGE:Not Found
> >>> SERVLET:default
> >>> 
> >>>
> >>> 
> >>> 
> >>>
> >> Sorry if this has already been discussed somewhere; I have not been able
> >> to find anything helpful yet.
> >>
> >> Thank you!
> >>
> >> Leon
> >>


Re: 404 Errors on update/extract

2021-02-05 Thread nq

Hi Alex,


Thanks a lot for your help!

I have tested the same using the 'techproducts' example as proposed, and 
it worked fine.



You are right, the documentation seems to be outdated in this aspect.

I have just reviewed the solrconfig.xml of the 'schemaless' example and 
found all the Solr Cell config was completely missing.


After adding it as described at

https://lucene.apache.org/solr/guide/8_8/uploading-data-with-solr-cell-using-apache-tika.html#configuring-the-extractingrequesthandler-in-solrconfig-xml

everything worked fine again.


What can I do to help updating the docs?


Best regards,

Leon


Am 05.02.21 um 16:15 schrieb Alexandre Rafalovitch:

I think the extract handler is not defined in schemaless. This may be
a change from before and the documentation is out of sync.

Can you try 'techproducts' example instead of schemaless:
bin/solr stop (if you are still running it)
bin/solr start -e techproducts

Then the import command.

The Tika integration is defined in solrconfig.xml and needs both
handler defined and some libraries loaded. Once you confirmed you like
what you see, you can copy those into whatever configuration you are
working with.

Regards,
Alex.

On Fri, 5 Feb 2021 at 07:38, nq  wrote:

Hi,


I am new to Solr and tried to follow the guide to upload PDF data using
Tika, on Solr 8.7.0 (running on Debian 10):

https://lucene.apache.org/solr/guide/8_7/uploading-data-with-solr-cell-using-apache-tika.html

but I get an HTTP 404 error when trying to import the file.


In the solr installation directory, after spinning up the example server
using

solr/bin/solr -e schemaless

I firstly used the Post Tool to index a PDF file as described in the
guide, giving the following output (paths truncated using “[…]” for
privacy reasons):

bin/post -c gettingstarted example/exampledocs/solr-word.pdf -params
"literal.id=doc1"


java -classpath /[…]/solr-8.7.0/dist/solr-core-8.7.0.jar -Dauto=yes
-Dparams=literal.id=doc1 -Dc=gettingstarted -Ddata=files org.apa
che.solr.util.SimplePostTool example/exampledocs/solr-word.pdf
SimplePostTool version 5.0.0
Posting files to [base] url
http://localhost:8983/solr/gettingstarted/update?literal.id=doc1...
Entering auto mode. File endings considered are
xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
POSTing file solr-word.pdf (application/pdf) to [base]/extract
SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for
url:
http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1
esource.name=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf
SimplePostTool: WARNING: Response: 


Error 404 Not Found

HTTP ERROR 404 Not Found

URI:/solr/gettingstarted/update/extract
STATUS:404
MESSAGE:Not Found
SERVLET:default




SimplePostTool: WARNING: IOException while reading response:
java.io.FileNotFoundException:
http://localhost:8983/solr/gettingstarted/update/extract
?literal.id=doc1=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf

1 files indexed.
COMMITting Solr index changes to
http://localhost:8983/solr/gettingstarted/update?literal.id=doc1...
Time spent: 0:00:00.038

resulting in no actual changes being visible in the Solr.


Using curl results in the same HTTP response:


curl
'http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1=true'
-F "myfile=@example
/exampledocs/solr-word.pdf"



Error 404 Not Found

HTTP ERROR 404 Not Found

URI:/solr/gettingstarted/update/extract
STATUS:404
MESSAGE:Not Found
SERVLET:default






Sorry if this has already been discussed somewhere; I have not been able
to find anything helpful yet.

Thank you!

Leon



Re: 404 Errors on update/extract

2021-02-05 Thread Alexandre Rafalovitch
I think the extract handler is not defined in schemaless. This may be
a change from before and the documentation is out of sync.

Can you try 'techproducts' example instead of schemaless:
bin/solr stop (if you are still running it)
bin/solr start -e techproducts

Then the import command.

The Tika integration is defined in solrconfig.xml and needs both
handler defined and some libraries loaded. Once you confirmed you like
what you see, you can copy those into whatever configuration you are
working with.

Regards,
   Alex.

On Fri, 5 Feb 2021 at 07:38, nq  wrote:
>
> Hi,
>
>
> I am new to Solr and tried to follow the guide to upload PDF data using
> Tika, on Solr 8.7.0 (running on Debian 10):
>
> https://lucene.apache.org/solr/guide/8_7/uploading-data-with-solr-cell-using-apache-tika.html
>
> but I get an HTTP 404 error when trying to import the file.
>
>
> In the solr installation directory, after spinning up the example server
> using
>
> solr/bin/solr -e schemaless
>
> I firstly used the Post Tool to index a PDF file as described in the
> guide, giving the following output (paths truncated using “[…]” for
> privacy reasons):
>
> bin/post -c gettingstarted example/exampledocs/solr-word.pdf -params
> "literal.id=doc1"
>
> > java -classpath /[…]/solr-8.7.0/dist/solr-core-8.7.0.jar -Dauto=yes
> > -Dparams=literal.id=doc1 -Dc=gettingstarted -Ddata=files org.apa
> > che.solr.util.SimplePostTool example/exampledocs/solr-word.pdf
> > SimplePostTool version 5.0.0
> > Posting files to [base] url
> > http://localhost:8983/solr/gettingstarted/update?literal.id=doc1...
> > Entering auto mode. File endings considered are
> > xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
> > POSTing file solr-word.pdf (application/pdf) to [base]/extract
> > SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for
> > url:
> > http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1
> > esource.name=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf
> > SimplePostTool: WARNING: Response: 
> > 
> > 
> > Error 404 Not Found
> > 
> > HTTP ERROR 404 Not Found
> > 
> > URI:/solr/gettingstarted/update/extract
> > STATUS:404
> > MESSAGE:Not Found
> > SERVLET:default
> > 
> >
> > 
> > 
> > SimplePostTool: WARNING: IOException while reading response:
> > java.io.FileNotFoundException:
> > http://localhost:8983/solr/gettingstarted/update/extract
> > ?literal.id=doc1=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf
> >
> > 1 files indexed.
> > COMMITting Solr index changes to
> > http://localhost:8983/solr/gettingstarted/update?literal.id=doc1...
> > Time spent: 0:00:00.038
> resulting in no actual changes being visible in the Solr.
>
>
> Using curl results in the same HTTP response:
>
> > curl
> > 'http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1=true'
> > -F "myfile=@example
> > /exampledocs/solr-word.pdf"
> > 
> > 
> > 
> > Error 404 Not Found
> > 
> > HTTP ERROR 404 Not Found
> > 
> > URI:/solr/gettingstarted/update/extract
> > STATUS:404
> > MESSAGE:Not Found
> > SERVLET:default
> > 
> >
> > 
> > 
> >
>
> Sorry if this has already been discussed somewhere; I have not been able
> to find anything helpful yet.
>
> Thank you!
>
> Leon
>


404 Errors on update/extract

2021-02-05 Thread nq

Hi,


I am new to Solr and tried to follow the guide to upload PDF data using 
Tika, on Solr 8.7.0 (running on Debian 10):


https://lucene.apache.org/solr/guide/8_7/uploading-data-with-solr-cell-using-apache-tika.html

but I get an HTTP 404 error when trying to import the file.


In the solr installation directory, after spinning up the example server 
using


solr/bin/solr -e schemaless

I firstly used the Post Tool to index a PDF file as described in the 
guide, giving the following output (paths truncated using “[…]” for 
privacy reasons):


bin/post -c gettingstarted example/exampledocs/solr-word.pdf -params 
"literal.id=doc1"


java -classpath /[…]/solr-8.7.0/dist/solr-core-8.7.0.jar -Dauto=yes 
-Dparams=literal.id=doc1 -Dc=gettingstarted -Ddata=files org.apa

che.solr.util.SimplePostTool example/exampledocs/solr-word.pdf
SimplePostTool version 5.0.0
Posting files to [base] url 
http://localhost:8983/solr/gettingstarted/update?literal.id=doc1...
Entering auto mode. File endings considered are 
xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log

POSTing file solr-word.pdf (application/pdf) to [base]/extract
SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for 
url: 
http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1

esource.name=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf
SimplePostTool: WARNING: Response: 


Error 404 Not Found

HTTP ERROR 404 Not Found

URI:/solr/gettingstarted/update/extract
STATUS:404
MESSAGE:Not Found
SERVLET:default




SimplePostTool: WARNING: IOException while reading response: 
java.io.FileNotFoundException: 
http://localhost:8983/solr/gettingstarted/update/extract
?literal.id=doc1=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf 


1 files indexed.
COMMITting Solr index changes to 
http://localhost:8983/solr/gettingstarted/update?literal.id=doc1...

Time spent: 0:00:00.038

resulting in no actual changes being visible in the Solr.


Using curl results in the same HTTP response:

curl 
'http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1=true' 
-F "myfile=@example

/exampledocs/solr-word.pdf"



Error 404 Not Found

HTTP ERROR 404 Not Found

URI:/solr/gettingstarted/update/extract
STATUS:404
MESSAGE:Not Found
SERVLET:default







Sorry if this has already been discussed somewhere; I have not been able 
to find anything helpful yet.


Thank you!

Leon