Hi, I have been using tika for a while now without any problems and I am a
big fan of the software. I wanted to do my part and report what I suspect
might be a bug.
My code uses two different libraries, javaMail, java-libpst, and I am unit
testing with dumpster. When I send the email, the last unit test that I
built with dumbpster was to make sure that all of the attachments were
appended correctly, this failed. After doing some nitty and gritty
debugging, I discovered that if I positioned a
System.out.println(in.read()); directly before where I was calling tika,
it would yield the correct number on the console. However, if I used the
same command after where tiks was called for this case, it read -1.
public void sendAsEmail(PSTMessage email, String parent, String dir)
throws IOException, MessagingException, PSTException {
String subject = email.getSubject();
String to = primaryRecipientsEmail(email);
String from = email.getSenderEmailAddress();
if (!isValidEmailAddress(from)) {
from = "[email protected]";
}
Properties props = new Properties();
props.put("mail.transport.protocol", "smtp");
props.put("mail.smtp.host", "localhost");
props.put("mail.smtp.auth", "false");
props.put("mail.debug", "false");
props.put("mail.smtp.port", "3025");//change back to 25
Session session = Session.getDefaultInstance(props);
Transport transport = session.getTransport("smtp");
transport.connect();
Message message = new MimeMessage(session);
message.addHeader("Parent-Info", parent);
message.addHeader("directory", dir);
message.setSubject(subject);
messageBodyPart.setText(email.getBody());
multipart.addBodyPart(messageBodyPart);
message.setFrom(new InternetAddress(from));
message.setRecipients(Message.RecipientType.TO<http://message.recipienttype.to/>,
InternetAddress
.parse(to));
try {
String transportHeaders = email.getTransportMessageHeaders();
String[] headers = parseTransporHeaders(transportHeaders);
for (String header : headers) {
messageBodyPart.addHeaderLine(header);
multipart.addBodyPart(messageBodyPart);
}
} catch (Exception e) {
log.info("missing chunk is transport headers: " + e);
}
try {
if(email.hasAttachments()){
int attachmentIndex = 0;
while (attachmentIndex < email.getNumberOfAttachments()) {
PSTAttachment attachment = email.getAttachment(attachmentIndex);
InputStream in= attachment.getFileInputStream();
if (attachment.getAttachMethod() !=
PSTAttachment.ATTACHMENT_METHOD_EMBEDDED
&& attachment.getAttachMethod() !=
PSTAttachment.ATTACHMENT_METHOD_OLE) {
String filename = attachment.getFilename();
String mime = tika.detect(in); //here is where I
called tika for use in a method that has since been depreciated.
messageBodyPart = new MimeBodyPart();
messageBodyPart.attachFile(file);
messageBodyPart.setFileName(filename);
multipart.addBodyPart(messageBodyPart);
} else {
log.info("not base 64 file: " + attachment.getFilename());
}
in.close();
attachmentIndex++;
}
}
}catch(Exception e){
log.info("failed attaching file to "+e);
}
message.setContent(multipart);
transport.sendMessage(message, message.getAllRecipients());
transport.close();
}
Following the advice of Ken Krugler, I though I would share this on this
list to see if it was an error in my code or an issue in tika.