Andre Unless I am missing something, I would stay on the side of "keeping it simple and modularized” where if any extraction/transformation/modification etc., of anything is required that is a job of another component and in fact faced the very similar question few days ago about attachments from IMAP/POP3 etc. As you mentioned already using MimeMessageParser is straight forward and allows one to restore InputStream back into a java.mail.Message from which you can extract and get to pretty much anything and you are already doing in in ExtactAttachment processor so I would continue on that pass.
That is of course my opinion, so would be nice to see what other’s think. Cheers Oleg > On Jul 24, 2016, at 8:38 AM, Andre <[email protected]> wrote: > > I have raised NIFI-2380 to track this improvement. > > While raising the ticket I was wondering: > > are you happy to give the use the option to chose if to extract the > winmail.dat or not? > > I mean something like: > > - PROPERTY: "Extract Attachments within a TNEF (i.e. winmail.data): true / > false > > If yes, then every time a decoding occur we test the name (or something > better in case it is possible) and then extract it. An attachment created > by a TNEF file would have an attribute email.attachment.tnefdecoded (or > whatever name we decide) set to yes. > > If no, processing continues as it is today (i.e. purely based on Apache > Commons MimeMessageParser). > > > Another possible solution would be an additional processor but IMNSHO this > would be overkill and counter productive. > > Ken to hear your thoughts > > On Sun, Jul 17, 2016 at 4:46 PM, Andre <[email protected]> wrote: > >> Dan, >> >> Ingesting Microsoft Journals seem like a great suggestion for a new >> processor ( ParseExchangeJounal ?). >> >> Regarding TNEF: As far as I know, Apache Commons - Mail does not pase >> "winmail.dat" >> type attachments. As far as I understand the only ASL compatible >> implementation of a TNEF extractor is Apache's POI and even that >> implementation is not part of POI's main release. >> >> If TNEF support is required we will ether have to code from scratch or >> perhaps use https://github.com/koodaamo/tnefparse together with >> ExecuteScript (although since tnefparse is LGPL, this solution cannot be >> packaged as part of NiFi). >> >> Cheers >> >> On Sun, Jul 17, 2016 at 10:53 AM, djmdata <[email protected]> wrote: >> >>> What is the JIRA #? >>> >>> I have a production system that reads email from a custom SMTP listener >>> and >>> places the SMTP payload into Kafka. A Storm topology reads messages from >>> Kafka and parses the emails (Java code using JavaMail API) into useful >>> info >>> (subject, text, attachments, body, etc...). >>> >>> I'm looking at plugging NiFi into this to replace the custom SMTP >>> listener. >>> If you had a processor that could act as a reliable (we can't lose emails) >>> and performant SMTP listener alternative we would use it. >>> >>> Your "email parser processor" is an interesting idea - but beware of the >>> mess you'll find in the wild with email. In our case, we try to parse >>> Exchange (full of non-standard wonders like "TNEF" attachments") as well >>> as >>> email from virtually anywhere (GMail, Yahoo, Joe's email client...). If >>> you >>> can crack that you'll be on to something. We have even more complexity in >>> that we read "Microsoft Journals" which wrap the standard SMTP layout in a >>> Microsoft layer (you'll see this at large Exchange shops doing this kind >>> of >>> thing for use cases like compliance). >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-nifi-developer-list.39713.n7.nabble.com/ListenSMTP-processor-tp10510p12827.html >>> Sent from the Apache NiFi Developer List mailing list archive at >>> Nabble.com. >>> >> >>
