[ 
https://issues.apache.org/jira/browse/CONNECTORS-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15030893#comment-15030893
 ] 

Karl Wright commented on CONNECTORS-1251:
-----------------------------------------

I've looked at the code, and could find no obvious encoding issues.  
Specifically, I looked at this:

{code}
  private <T extends ConfluenceResource> ConfluenceResponse<T> 
responseFromHttpEntity(HttpEntity entity, ConfluenceResourceBuilder<T> builder)
      throws Exception {
    String stringEntity = EntityUtils.toString(entity);

    JSONObject responseObject;
    try {
      responseObject = new JSONObject(stringEntity);
      ConfluenceResponse<T> response = ConfluenceResponse
          .fromJson(responseObject, builder);
      if (response.getResults().size() == 0) {
        logger.debug("[Processing] No {} found in the Confluence response", 
builder.getType().getSimpleName());
      }

      return response;
    } catch (JSONException e) {
      logger.debug("Error parsing JSON response");
      throw new Exception();
    }

  }
{code}

... which calls EntityUtils.toString(), which should be sufficient.
There is some concern that doing all of this in memory is not a good idea; we 
usually stream content that can be unbounded, rather than convert to a single 
string.


> Confluence umlauts broken
> -------------------------
>
>                 Key: CONNECTORS-1251
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1251
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Confluence connector
>    Affects Versions: ManifoldCF 2.2
>         Environment: Ubuntu Linux 14.04
> Java 1.8.0_51-b16
> Tomcat 7.0.52
>            Reporter: Jens Grassel
>            Assignee: Antonio David Pérez Morales
>              Labels: umlauts, unicode
>             Fix For: ManifoldCF 2.3
>
>
> Hi,
> I've noticed that the confluence connector seems to be unable to cope with 
> special characters like umlauts (ä, ö, ü, etc.). In our index they are broken 
> for example {{ü}} becomes {{ü}}.
> I tried to pipe the extracted content through the tika extractor but the 
> result was the same.
> Regards,
> Jens



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to