[
https://issues.apache.org/jira/browse/CONNECTORS-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138642#comment-14138642
]
Karl Wright commented on CONNECTORS-956:
----------------------------------------
As for what encoding to use, rather than UTF-8, please read this:
http://grokbase.com/t/lucene/solr-user/135bk8zyzp/solr-4-2-1-behavior-with-field-names-that-use-|-character
The rule is that the field names in Lucene/Solr are what you find in Java
identifiers, plus embedded "." and "-", and can't start with a "$". This is
not enforced, but only these are guaranteed to work. For the actual java
identifier spec, read this:
http://docs.oracle.com/javase/specs/jls/se7/html/jls-3.html#jls-3.8
NOTE WELL that this EXCLUDES field names that include most punctuation, such as
":".
Now, the problem is, should the Solr Connector enforce this in some way, or
should we just let the documents get posted to Solr and let them crash and burn
there? People can filter fields out using a document transformer now, but for
some connectors (e.g. CMIS) it would be quite a pain to get the field mapping
set up correctly. Looking for ideas on how to make this work best.
> Field names are URL encoded
> ---------------------------
>
> Key: CONNECTORS-956
> URL: https://issues.apache.org/jira/browse/CONNECTORS-956
> Project: ManifoldCF
> Issue Type: Improvement
> Components: Lucene/SOLR connector
> Affects Versions: ManifoldCF 1.6.1
> Reporter: Piergiorgio Lucidi
> Assignee: Karl Wright
> Fix For: ManifoldCF 2.0
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> The field names provided by some repositories such as Alfresco are based on
> an URI similar to:
> {code}
> {http://www.alfresco.org/model/system}store_identifier
> {code}
> But in Solr we found the following field name:
> {code}
> http_3a_2f_2fwww_alfresco_org_2fmodel_2fsystem_2f1_0_7dstore_identifier
> {code}
> The code involved in the Solr connector is the following:
> {code}
> protected static String preEncode(String fieldName)
> {
> return URLEncoder.encode(fieldName);
> }
> {code}
> Probably we should try to solve it removing the preEncode invocation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)