[
https://issues.apache.org/jira/browse/CONNECTORS-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137177#comment-14137177
]
Karl Wright commented on CONNECTORS-956:
----------------------------------------
Another person has opened a ticket that is a duplicate of this one.
The reason that this ticket has not been fixed is because there are still
problems with SolrJ generating illegal XML when arbitrary characters are used
as field names. So, SOME encoding is essential, in order for fieldnames to be
transmitted to Solr correctly. The Solr/Lucene team also tightly restricts the
characters that can be used in fields fairly drastically, so even if this
problem is fixed in MCF, there's a good chance you still won't be able to use
whatever funky field name your repository connector comes up with in Solr
itself.
Given all that, I still believe that URL encoding is probably too restrictive,
in that some characters which are legal field names wind up getting encoded, so
we can try to introduce an option for a different encoding. But this is not
likely to satisfy everybody regardless, since the problem is fundamentally a
Solr restriction.
> Field names are URL encoded
> ---------------------------
>
> Key: CONNECTORS-956
> URL: https://issues.apache.org/jira/browse/CONNECTORS-956
> Project: ManifoldCF
> Issue Type: Improvement
> Components: Lucene/SOLR connector
> Affects Versions: ManifoldCF 1.6.1
> Reporter: Piergiorgio Lucidi
> Assignee: Karl Wright
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> The field names provided by some repositories such as Alfresco are based on
> an URI similar to:
> {code}
> {http://www.alfresco.org/model/system}store_identifier
> {code}
> But in Solr we found the following field name:
> {code}
> http_3a_2f_2fwww_alfresco_org_2fmodel_2fsystem_2f1_0_7dstore_identifier
> {code}
> The code involved in the Solr connector is the following:
> {code}
> protected static String preEncode(String fieldName)
> {
> return URLEncoder.encode(fieldName);
> }
> {code}
> Probably we should try to solve it removing the preEncode invocation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)