Hello Andrew,

I'm the original author of this code.
It was done so long ago :), happy to see it's used.

The mbox file you sent contains only one message.
The code I wrote might not take this into account.
(Contributions are welcomed).

Also you should be able to specify your own REGEX if the one provided is not enough.

Please share your results via mailing list.
It will help others.

Regards,
Eugen

On 14.02.2022 18:52, Andrew Lalis wrote:
Hi all,

I'm trying to use Mime4J's MboxIterator to parse an Mbox file (which actually was obtained from the Apache mailing lists archive originally). The problem is that after downloading the Mbox file, I attempt to parse it like so:

MboxIterator mboxIterator 
=MboxIterator.fromFile(file).charset(StandardCharsets.ISO_8859_1).build(); MimeStreamParser 
parser =new MimeStreamParser(); List<Email> emails =new ArrayList<>(); for 
(CharBufferWrapper w :mboxIterator) {
    var handler =new EmailContentHandler(); parser.setContentHandler(handler); 
try {
       parser.parse(w.asInputStream(StandardCharsets.UTF_8)); 
emails.add(handler.getEmail()); }catch (MimeException |IOException e) {
       e.printStackTrace(); }
}

When running this snippet, my program crashes with an IllegalArgumentException:

|Exception in thread "main" java.lang.IllegalArgumentException: File A:\Programming\GitHub-andrewlalis\ApacheEmailDownloader\emails\hadoop.apache.org_common-dev_2006-01.mbox does not contain From_ lines that match the pattern '^From \S+@\S.*\d{4}$'! Maybe not be a valid Mbox or wrong matcher.     at org.apache.james.mime4j.mboxiterator.MboxIterator.initMboxIterator(MboxIterator.java:107)     at org.apache.james.mime4j.mboxiterator.MboxIterator.<init>(MboxIterator.java:87)     at org.apache.james.mime4j.mboxiterator.MboxIterator.<init>(MboxIterator.java:53)     at org.apache.james.mime4j.mboxiterator.MboxIterator$Builder.build(MboxIterator.java:260)
     at nl.andrewl.mbox_parser.MBoxParser.parse(MBoxParser.java:26)
     at nl.andrewl.mbox_parser.MBoxParser.main(MBoxParser.java:43)|

I have attached the file in question, for reference.

Is there something else I need to do in order to be able to read MBox files?


Reply via email to