lewismc commented on PR #733: URL: https://github.com/apache/nutch/pull/733#issuecomment-1157149313
This is exciting!!! Excellent debugging 👍 ... you got further than me. I can't get around to testing it until next week at earliest. Thinking back, I did observe revisits (recursive access) to URLStreamHandlerFactory but didn't pursue that line of inquiry at that point in time. To get a bit more context I did review [HADOOP-14598-005.patch](https://issues.apache.org/jira/secure/attachment/12880380/HADOOP-14598-005.patch) and the current class it affects. Reading the code it makes more sense but admittedly until I debug this I still don't have the full context. I took a look at [hadoop-hdfs TestUrlStreamHandler.java](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/TestUrlStreamHandler.java) as well which I really like the look of. To build out some more confidence in this aspect of the codebase, we could create some tests for the [nutch URLStreamHandlerFactory.java](https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/plugin/URLStreamHandlerFactory.java). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org