[
https://issues.apache.org/jira/browse/NUTCH-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18082239#comment-18082239
]
ASF GitHub Bot commented on NUTCH-3173:
---------------------------------------
lfoppiano opened a new pull request, #917:
URL: https://github.com/apache/nutch/pull/917
This PR covers NUTCH-3173 for okhttp-protocol and attempt to solve the
problem in a generic way.
We add a new method in the Response.java interface contract `getRawUrl()`
which returns the URL that was initially provided by the caller. `getUrl()`
would return the actual URL used for the request.
> protocol-okhttp: store OkHttp's internal URL in response metadata
> -----------------------------------------------------------------
>
> Key: NUTCH-3173
> URL: https://issues.apache.org/jira/browse/NUTCH-3173
> Project: Nutch
> Issue Type: Improvement
> Components: plugin, protocol
> Affects Versions: 1.23
> Reporter: Sebastian Nagel
> Priority: Minor
> Fix For: 1.23
>
>
> OkHttp uses its
> [HttpUrl|https://square.github.io/okhttp/5.x/okhttp/okhttp3/-http-url/index.html]
> for HTTP requests. There are some differences between HttpURl and
> java.net.URL resp. java.net.URI. And the HttpUrl.parse may parse a URL string
> differently than Java's URL class.
> It would be good to store the stringified HttpUrl in the response metadata,
> at least, if it differs from the original URL string. The
> [Request|https://square.github.io/okhttp/5.x/okhttp/okhttp3/-request/index.html]
> holds the HttpUrl object.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)