Hi,
i'm trying to parse this mbox file
http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200602 with mime4j
with 0.6 version.
The parsing code is like this:
--------------------------
org.apache.james.mime4j.parser.MimeTokenStream stream = new MimeTokenStream();
BufferedInputStream bufferedInputStream = new BufferedInputStream(new
FileInputStream("/Users/jz/Documents/workspace/ms/dap/modules/dap-conductor/src/data/mbox/200602"));
while (bufferedInputStream.available() > 0) {
stream.parse(bufferedInputStream);
handleParse(stream);
System.out.println("---------------------------------------------");
}
--------------------------
Some messages seems to be parsed correctly, but sometime the parser ends a
message in the middle of a body and starts the next one.
A mid of a body:
--------------------------
Context.java:266)
at
org.mortbay.jetty.servlet.WebApplicationContext.doStart(WebApplicationContex
t.java:449)
at org.mortbay.util.Container.start(Container.java:72)
at org.mortbay.http.HttpServer.doStart(HttpServer.java:753)
at org.mortbay.util.Container.start(Container.java:72)
at
org.apache.hadoop.mapred.JobTrackerInfoServer$HTTPStarter.run(JobTrackerInfo
Server.java:101)
--------------------------
The next field:
--------------------------
FIELD: ainer.start(Container.java: 72)
at org.mortbay.http.HttpServer.doStart(HttpServer.java:753)
at org.mortbay.util.Container.start(Container.java:72)
at
--------------------------
Is mime4j apropriate to parse mbox format ? Is there any configuration or trick
which can help me here ?
best regards
Johannes