[ 
https://issues.apache.org/jira/browse/CONNECTORS-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14991368#comment-14991368
 ] 

Antonio David Pérez Morales commented on CONNECTORS-1251:
---------------------------------------------------------

[~jan0sch] are you using Solr as index backend storage? 
Maybe it can be due to your Solr configuration of the fields because Confluence 
connector is using the Confluence REST API for crawling content and UTF-8 as 
encoding format (If I remember well). So the umlauts should work well.

Can you test adding a FileSystemOutputConnector for your job and checking if 
the written files contain the umlauts or not?

> Confluence umlauts broken
> -------------------------
>
>                 Key: CONNECTORS-1251
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1251
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Confluence connector
>    Affects Versions: ManifoldCF 2.2
>         Environment: Ubuntu Linux 14.04
> Java 1.8.0_51-b16
> Tomcat 7.0.52
>            Reporter: Jens Grassel
>            Assignee: Antonio David Pérez Morales
>              Labels: umlauts, unicode
>             Fix For: ManifoldCF 2.3
>
>
> Hi,
> I've noticed that the confluence connector seems to be unable to cope with 
> special characters like umlauts (ä, ö, ü, etc.). In our index they are broken 
> for example {{ü}} becomes {{ü}}.
> I tried to pipe the extracted content through the tika extractor but the 
> result was the same.
> Regards,
> Jens



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to