[ 
https://issues.apache.org/jira/browse/CONNECTORS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15970058#comment-15970058
 ] 

Karl Wright commented on CONNECTORS-1410:
-----------------------------------------

[~kamaci]: So your claim is that:

{code}
              InputStream is = msg.getInputStream();
... ingest ..
{code}

... is exactly the same as:

{code}
                  Object o = msg.getContent();
                  if (o instanceof Multipart) {
                    Multipart mp = (Multipart) msg.getContent();
                    for (int k = 0, n = mp.getCount(); k < n; k++) {
                      Part part = mp.getBodyPart(k);
                      String disposition = part.getDisposition();
                      if ((disposition == null)) {
                        MimeBodyPart mbp = (MimeBodyPart) part;
                        if (mbp.isMimeType(EmailConfig.MIMETYPE_TEXT_PLAIN)) {
                          rd.addField(EmailConfig.EMAIL_BODY, 
mbp.getContent().toString());
                        } else if (mbp.isMimeType(EmailConfig.MIMETYPE_HTML)) {
                          rd.addField(EmailConfig.EMAIL_BODY, 
mbp.getContent().toString()); //handle html accordingly. Returns content with 
html tags
                        }
                      }
                    }
                  } else if (o instanceof String) {
                    rd.addField(EmailConfig.EMAIL_BODY, (String)o);
                  }
... ingest ...
{code}

If that's the case then I should have caught this earlier; having the BODY be 
indexed twice is just plain wrong.  I think we should take out all reference to 
the BODY throughout the connector and just use the InputStream.



> Binary Attachment Data as Plain Text at Email Content
> -----------------------------------------------------
>
>                 Key: CONNECTORS-1410
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1410
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Email connector
>    Affects Versions: ManifoldCF 2.6
>            Reporter: Furkan KAMACI
>            Assignee: Furkan KAMACI
>             Fix For: ManifoldCF 2.8
>
>         Attachments: CONNECTORS-1410.patch, CONNECTORS-1410.patch
>
>
> Previously, we were indexing e-mails and its attachments together. We changed 
> this logic with CONNECTORS-1375 as indexing e-mail and its attachments 
> separately.
> However, there is a problem. Content fields of emails which has attachment(s) 
> includes both body and attachments's binary content as plain text.
> As we index attachments separately, we can just index body as content instead 
> of appending email body and all attachments' binary data as plain text.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to