[
https://issues.apache.org/jira/browse/CONNECTORS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15970058#comment-15970058
]
Karl Wright commented on CONNECTORS-1410:
-----------------------------------------
[~kamaci]: So your claim is that:
{code}
InputStream is = msg.getInputStream();
... ingest ..
{code}
... is exactly the same as:
{code}
Object o = msg.getContent();
if (o instanceof Multipart) {
Multipart mp = (Multipart) msg.getContent();
for (int k = 0, n = mp.getCount(); k < n; k++) {
Part part = mp.getBodyPart(k);
String disposition = part.getDisposition();
if ((disposition == null)) {
MimeBodyPart mbp = (MimeBodyPart) part;
if (mbp.isMimeType(EmailConfig.MIMETYPE_TEXT_PLAIN)) {
rd.addField(EmailConfig.EMAIL_BODY,
mbp.getContent().toString());
} else if (mbp.isMimeType(EmailConfig.MIMETYPE_HTML)) {
rd.addField(EmailConfig.EMAIL_BODY,
mbp.getContent().toString()); //handle html accordingly. Returns content with
html tags
}
}
}
} else if (o instanceof String) {
rd.addField(EmailConfig.EMAIL_BODY, (String)o);
}
... ingest ...
{code}
If that's the case then I should have caught this earlier; having the BODY be
indexed twice is just plain wrong. I think we should take out all reference to
the BODY throughout the connector and just use the InputStream.
> Binary Attachment Data as Plain Text at Email Content
> -----------------------------------------------------
>
> Key: CONNECTORS-1410
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1410
> Project: ManifoldCF
> Issue Type: Bug
> Components: Email connector
> Affects Versions: ManifoldCF 2.6
> Reporter: Furkan KAMACI
> Assignee: Furkan KAMACI
> Fix For: ManifoldCF 2.8
>
> Attachments: CONNECTORS-1410.patch, CONNECTORS-1410.patch
>
>
> Previously, we were indexing e-mails and its attachments together. We changed
> this logic with CONNECTORS-1375 as indexing e-mail and its attachments
> separately.
> However, there is a problem. Content fields of emails which has attachment(s)
> includes both body and attachments's binary content as plain text.
> As we index attachments separately, we can just index body as content instead
> of appending email body and all attachments' binary data as plain text.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)