[
https://issues.apache.org/jira/browse/CONNECTORS-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396608#comment-13396608
]
Karl Wright commented on CONNECTORS-477:
----------------------------------------
bq. But there are the sites which have improperly encoded URL links in Japan.
bq. I want to support this on webconnector but I'm thinking of better solution
for a while.
I agree, especially where there are browsers that accept the bad URLs. We have
to have something to evaluate our code against, to emulate. And, the changes
should go in the org.apache.manifoldcf.crawler.connectors.webconnector.WebURL
class.
> Support for full-width space against url
> ----------------------------------------
>
> Key: CONNECTORS-477
> URL: https://issues.apache.org/jira/browse/CONNECTORS-477
> Project: ManifoldCF
> Issue Type: Improvement
> Components: Web connector
> Reporter: Shinichiro Abe
> Assignee: Shinichiro Abe
> Priority: Minor
> Fix For: ManifoldCF next
>
> Attachments: CONNECTORS-477.patch
>
>
> When url includes full-width space (" ") MCF can't ingest their documents.
> e.g.
> 1.file name
> http://server/site1/Shared%20Documents/test/aaa bbb.txt
> 2.path
> http://localhost/aaa bbb/aaa.txt
> MCF's log says:
> {noformat}
> WEB: Can't use url '/site1/Shared%20Documents/test/aaa bbb.txt' because it is
> badly formed: Illegal character in path at index 34:
> /site1/Shared%20Documents/test/aaa bbb.txt
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira