[ 
https://issues.apache.org/jira/browse/CONNECTORS-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137164#comment-14137164
 ] 

Karl Wright commented on CONNECTORS-1034:
-----------------------------------------

Hi Edgardo,

First, since this is the same issue as CONNECTORS-956, and CONNECTORS-956 is 
still open, please let's close this issue and discuss your problem in that 
ticket.

Second, the issue is that SolrJ (and, apparently, Solr as well, to some extent) 
simply does not support field names which have characters not that are outside 
a very specific set.  Until Solr changes this behavior, we cannot fix it.  Even 
if you managed to send a field that included an illegal character to SolrJ and 
therefore to Solr, there's no guarantee that that would work.  URL encoding is 
not ideal for this purpose, so if you could look up the list of disallowed 
field name characters, we could try to be more specific about which characters 
we encode and which we don't.

Third, the behavior of SolrJ with regard to this issue is very broken.  SolrJ 
originally did not do anything to insure that legal XML was generated for field 
names, because they assumed that nobody would be using field names that 
contained illegal characters.  So, no encoding at all will almost certainly 
lead to badly formed XML for many or even most documents, unless SolrJ has been 
changed to address this issue.  (I opened a SOLR ticket for this problem, but 
the Solr team declined to fix it for many releases, and since then I've lost 
track.)

Fourth, now we have backwards compatibility issues, because people have named 
their solr fields based on ManifoldCF's workaround behavior to the above 
problems.  Your suggestion of a UI switch would address ONLY this last issue.

SO, given all that, let's continue the discussion in the CONNECTORS-956 ticket, 
and I'll close this one.



> Manifold 1.7
> ------------
>
>                 Key: CONNECTORS-1034
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1034
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Solr-4.x-component
>    Affects Versions: ManifoldCF 1.7
>            Reporter: Edgardo Ambrosi
>              Labels: patch
>
> Following the issue CONNECTORS-956, since the behavior makes ManifoldCF 
> unuseful for Alfresco-Solr-based environment , because it is impossible to 
> correctly populate Solr, could you provide at least a solution as 
> a checkbox in the "job specification" JSP  page, tab "Solr Field Mapping" 
> near "Keep All Metadata" to choose preEncode() or not.
> Our Use Case is: 
> Alfresco Server 4.2 enterprise, ManifoldCF, Solr server 4.7.1.
> Set a repo connection type CMIS, 
> Set a output connection type Solr, 
> Set a job with cmis query as "select * from cmis:document" (the repo has only 
> 1 document),
> Running the jobs it normally end but...
> querying Solr the result set reports a strange encoding of the field name:
> if in Alfresco the fileds is named: cmis:name
> then in Solr after ManifoldCF has populated it the index contains the encoded 
> field as cmis_3Aname
> Best



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to