Mime4J need to get used with one InputStream per message. So you would need to split the mbox file.
Bye, Norman 2010/6/3 Johannes Zillmann <[email protected]>: > Hi, > > i'm trying to parse this mbox file > http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200602 with mime4j > with 0.6 version. > The parsing code is like this: > -------------------------- > org.apache.james.mime4j.parser.MimeTokenStream stream = new MimeTokenStream(); > BufferedInputStream bufferedInputStream = new BufferedInputStream(new > FileInputStream("/Users/jz/Documents/workspace/ms/dap/modules/dap-conductor/src/data/mbox/200602")); > while (bufferedInputStream.available() > 0) { > stream.parse(bufferedInputStream); > handleParse(stream); > System.out.println("---------------------------------------------"); > } > -------------------------- > > Some messages seems to be parsed correctly, but sometime the parser ends a > message in the middle of a body and starts the next one. > > A mid of a body: > -------------------------- > Context.java:266) > at > org.mortbay.jetty.servlet.WebApplicationContext.doStart(WebApplicationContex > t.java:449) > at org.mortbay.util.Container.start(Container.java:72) > at org.mortbay.http.HttpServer.doStart(HttpServer.java:753) > at org.mortbay.util.Container.start(Container.java:72) > at > org.apache.hadoop.mapred.JobTrackerInfoServer$HTTPStarter.run(JobTrackerInfo > Server.java:101) > -------------------------- > > The next field: > -------------------------- > FIELD: ainer.start(Container.java: 72) > at org.mortbay.http.HttpServer.doStart(HttpServer.java:753) > at org.mortbay.util.Container.start(Container.java:72) > at > -------------------------- > > Is mime4j apropriate to parse mbox format ? Is there any configuration or > trick which can help me here ? > > best regards > Johannes > >
