Hello, I am updating a nutch crawl that read files in directories that have spaces. The urls show %20 instead of spaces. This doesn't seem to be what the behavior was in the past.
In nutch 1.10 I get these results Nutch 1.10 ParseData:: Version: 5 Status: success(1,0) Title: Index of /nycor/10-15-2018 and on - Scanned Outlinks: 4 outlink: toUrl: file:/nycor/10-15-2018 and on - Scanned/2018/ anchor: 2018/ outlink: toUrl: file:/nycor/10-15-2018 and on - Scanned/2019/ anchor: 2019/ outlink: toUrl: file:/nycor/10-15-2018 and on - Scanned/2022/ anchor: 2022/ outlink: toUrl: file:/nycor/10-15-2018 and on - Scanned/Shipment Date Unknown/ anchor: Shipment Date Unknown/ in Nutch 1.19, I get this ParseData:: Version: 5 Status: success(1,0) Title: Index of /nycor/10-15-2018 and on - Scanned Outlinks: 4 outlink: toUrl: file:/nycor/10-15-2018%20and%20on%20-%20Scanned/2018/ anchor: 2018/ outlink: toUrl: file:/nycor/10-15-2018%20and%20on%20-%20Scanned/2019/ anchor: 2019/ outlink: toUrl: file:/nycor/10-15-2018%20and%20on%20-%20Scanned/2022/ anchor: 2022/ outlink: toUrl: file:/nycor/10-15-2018%20and%20on%20-%20Scanned/Shipment%20Date%20Unknown/ anchor: Shipment Date Unknown/ We are uploading to solr and the links aren't right with the %20s in the url. How do I remove the %20s? Thanks, Steve Cohen