Re: nutch adds %20 in urls instead of spaces

2024-01-10 Thread Steve Cohen
Thanks for the response Markus. disabling urlnormalizer-basic works. On Tue, Jan 9, 2024 at 3:43 PM Markus Jelsma wrote: > Hello Steve, > > Having those spaces normalized/encoded is expected behaviour with > urlnormalizer-basic active. I would recommend to keep it this way and have > all URLs in

Re: nutch adds %20 in urls instead of spaces

2024-01-09 Thread Markus Jelsma
Hello Steve, Having those spaces normalized/encoded is expected behaviour with urlnormalizer-basic active. I would recommend to keep it this way and have all URLs in Solr properly encoded. Having spaces in Solr IDs is also not recommended as it can lead to unexpected behaviour. If you really don'

Re: nutch adds %20 in urls instead of spaces

2024-01-09 Thread Jim Anderson
unsubscribe On Tue, Jan 9, 2024 at 1:20 PM Steve Cohen wrote: > Hello, > > I am updating a nutch crawl that read files in directories that have > spaces. The urls show %20 instead of spaces. This doesn't seem to be what > the behavior was in the past. > > In nutch 1.10 I get these results > > Nu