On Sat, 2014-01-04 at 23:33 +0100, Mariano Kamp wrote: > Hey guys. > > > I want to learn Hadoop and use my gmail mbox file as a basis for that. > That brought me to mime4j and for the most part this is working out > great. Thank you, btw. > > > But it took me a while to get my hands on a version of it. So I am > wondering if this project is still active? The download link [1] is > broken. Is this the official download link? >
Hi Mariano Unfortunately the project is has not been very active. On the positive side that presents potential contributors with more opportunities. > > After some time I found apache-mime4j-*-0.8.0-SNAPSHOT.jars. But it is > a higher version number as the [official|outdated] 0.7.2 and it also > seems different then what I can see in the trunk. Or at least it > seemed to me as the DefaultMessageBuilder is not part of the trunk, > but in 0.8.0. > You should be using the stable branch for the time being https://svn.apache.org/repos/asf/james/mime4j/branches/apache-mime4j-0.7 > It seems that Mime4j doesn't like long lines in the input and so the > parsing fails. > No, it does not. For good reasons. You hardly want your mail server to enter an infinite loop while trying to parse a malformed message. You can disable max line length / max header count limits (or increase them) by using a custom MimeConfig object. ... > > But to put things in perspective. I processed my mails from the last > ten years (100k+) and it only had issues with a few hundred. So it's > not a biggie, but wanted to give you this feedback. And I can provide > you with more examples if you need those. > > > Furthermore I ran into a NPE as well. > > > java.lang.NullPointerException > at > org.apache.james.mime4j.io.MimeBoundaryInputStream.<init>(MimeBoundaryInputStream.java:67) > at > org.apache.james.mime4j.stream.MimeEntity.createMimePartStream(MimeEntity.java:366) > at > org.apache.james.mime4j.stream.MimeEntity.advance(MimeEntity.java:320) > at > org.apache.james.mime4j.stream.MimeTokenStream.next(MimeTokenStream.java:368) > at > org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:176) > at > org.apache.james.mime4j.message.DefaultMessageBuilder.parseMessage(DefaultMessageBuilder.java:316) > at com.mboxanalytics.util.MboxUtil.parseMessage(MboxUtil.java:95) > ... > This one is a bug. Could you please raise a JIRA for this defect and attach the offending message to it? > > > > UnsupportedEncodingException and IllegalCharsetNameException available > as well, but probably correct. Full list: > > As far as UnsupportedEncodingException, IllegalCharsetNameException or similar are concerned one can easily avoid them, if so desired, by providing a custom BodyFactory implementation that applies some heuristics to map unsupported charsets to supported ones. Hope this helps Oleg
