Hello,
I am updating a nutch crawl that read files in directories that have
spaces. The urls show %20 instead of spaces. This doesn't seem to be what
the behavior was in the past.
In nutch 1.10 I get these results
Nutch 1.10
ParseData::
Version: 5
Status: success(1,0)
Title: Index of
unsubscribe
On Tue, Jan 9, 2024 at 1:20 PM Steve Cohen wrote:
> Hello,
>
> I am updating a nutch crawl that read files in directories that have
> spaces. The urls show %20 instead of spaces. This doesn't seem to be what
> the behavior was in the past.
>
> In nutch 1.10 I get these results
>
>
Hello Steve,
Having those spaces normalized/encoded is expected behaviour with
urlnormalizer-basic active. I would recommend to keep it this way and have
all URLs in Solr properly encoded. Having spaces in Solr IDs is also not
recommended as it can lead to unexpected behaviour.
If you really
3 matches
Mail list logo