Mime4J need to get used with one InputStream per message. So you would
need to split the mbox file.

Bye,
Norman


2010/6/3 Johannes Zillmann <[email protected]>:
> Hi,
>
> i'm trying to parse this mbox file 
> http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200602 with mime4j 
> with 0.6 version.
> The parsing code is like this:
> --------------------------
> org.apache.james.mime4j.parser.MimeTokenStream stream = new MimeTokenStream();
> BufferedInputStream bufferedInputStream = new BufferedInputStream(new 
> FileInputStream("/Users/jz/Documents/workspace/ms/dap/modules/dap-conductor/src/data/mbox/200602"));
> while (bufferedInputStream.available() > 0) {
>     stream.parse(bufferedInputStream);
>     handleParse(stream);
>     System.out.println("---------------------------------------------");
> }
> --------------------------
>
> Some messages seems to be parsed correctly, but sometime the parser ends a 
> message in the middle of a body and starts the next one.
>
> A mid of a body:
> --------------------------
> Context.java:266)
>        at
> org.mortbay.jetty.servlet.WebApplicationContext.doStart(WebApplicationContex
> t.java:449)
>        at org.mortbay.util.Container.start(Container.java:72)
>        at org.mortbay.http.HttpServer.doStart(HttpServer.java:753)
>        at org.mortbay.util.Container.start(Container.java:72)
>        at
> org.apache.hadoop.mapred.JobTrackerInfoServer$HTTPStarter.run(JobTrackerInfo
> Server.java:101)
> --------------------------
>
> The next field:
> --------------------------
> FIELD: ainer.start(Container.java:      72)
>        at org.mortbay.http.HttpServer.doStart(HttpServer.java:753)
>        at org.mortbay.util.Container.start(Container.java:72)
>        at
> --------------------------
>
> Is mime4j apropriate to parse mbox format ? Is there any configuration or 
> trick which can help me here ?
>
> best regards
> Johannes
>
>

Reply via email to