[ 
https://issues.apache.org/jira/browse/CONNECTORS-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137177#comment-14137177
 ] 

Karl Wright commented on CONNECTORS-956:
----------------------------------------

Another person has opened a ticket that is a duplicate of this one.

The reason that this ticket has not been fixed is because there are still 
problems with SolrJ generating illegal XML when arbitrary characters are used 
as field names.  So, SOME encoding is essential, in order for fieldnames to be 
transmitted to Solr correctly.  The Solr/Lucene team also tightly restricts the 
characters that can be used in fields fairly drastically, so even if this 
problem is fixed in MCF, there's a good chance you still won't be able to use 
whatever funky field name your repository connector comes up with in Solr 
itself.

Given all that, I still believe that URL encoding is probably too restrictive, 
in that some characters which are legal field names wind up getting encoded, so 
we can try to introduce an option for a different encoding.  But this is not 
likely to satisfy everybody regardless, since the problem is fundamentally a 
Solr restriction.


> Field names are URL encoded
> ---------------------------
>
>                 Key: CONNECTORS-956
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-956
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Lucene/SOLR connector
>    Affects Versions: ManifoldCF 1.6.1
>            Reporter: Piergiorgio Lucidi
>            Assignee: Karl Wright
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The field names provided by some repositories such as Alfresco are based on 
> an URI similar to:
> {code}
> {http://www.alfresco.org/model/system}store_identifier
> {code}
> But in Solr we found the following field name:
> {code}
> http_3a_2f_2fwww_alfresco_org_2fmodel_2fsystem_2f1_0_7dstore_identifier
> {code}
> The code involved in the Solr connector is the following:
> {code}
> protected static String preEncode(String fieldName)
>   {
>       return URLEncoder.encode(fieldName);
>   }
> {code}
> Probably we should try to solve it removing the preEncode invocation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to