Hi all,

I'm trying to use Mime4J's MboxIterator to parse an Mbox file (which actually was obtained from the Apache mailing lists archive originally). The problem is that after downloading the Mbox file, I attempt to parse it like so:

MboxIterator mboxIterator 
=MboxIterator.fromFile(file).charset(StandardCharsets.ISO_8859_1).build(); MimeStreamParser 
parser =new MimeStreamParser(); List<Email> emails =new ArrayList<>(); for 
(CharBufferWrapper w :mboxIterator) {
   var handler =new EmailContentHandler(); parser.setContentHandler(handler); 
try {
      parser.parse(w.asInputStream(StandardCharsets.UTF_8)); 
emails.add(handler.getEmail()); }catch (MimeException |IOException e) {
      e.printStackTrace(); }
}

When running this snippet, my program crashes with an IllegalArgumentException:

|Exception in thread "main" java.lang.IllegalArgumentException: File A:\Programming\GitHub-andrewlalis\ApacheEmailDownloader\emails\hadoop.apache.org_common-dev_2006-01.mbox does not contain From_ lines that match the pattern '^From \S+@\S.*\d{4}$'! Maybe not be a valid Mbox or wrong matcher.     at org.apache.james.mime4j.mboxiterator.MboxIterator.initMboxIterator(MboxIterator.java:107)     at org.apache.james.mime4j.mboxiterator.MboxIterator.<init>(MboxIterator.java:87)     at org.apache.james.mime4j.mboxiterator.MboxIterator.<init>(MboxIterator.java:53)     at org.apache.james.mime4j.mboxiterator.MboxIterator$Builder.build(MboxIterator.java:260)
    at nl.andrewl.mbox_parser.MBoxParser.parse(MBoxParser.java:26)
    at nl.andrewl.mbox_parser.MBoxParser.main(MBoxParser.java:43)|

I have attached the file in question, for reference.

Is there something else I need to do in order to be able to read MBox files?
From MAILER-DAEMON Tue Jan 31 18:39:33 2006
Return-Path: 
 
<hadoop-dev-return-1-apmail-lucene-hadoop-dev-archive=lucene.apache....@lucene.apache.org>
Delivered-To: apmail-lucene-hadoop-dev-arch...@locus.apache.org
Received: (qmail 52867 invoked from network); 31 Jan 2006 18:39:33 -0000
Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199)
  by minotaur.apache.org with SMTP; 31 Jan 2006 18:39:33 -0000
Received: (qmail 86338 invoked by uid 500); 31 Jan 2006 18:39:33 -0000
Delivered-To: apmail-lucene-hadoop-dev-arch...@lucene.apache.org
Received: (qmail 86322 invoked by uid 500); 31 Jan 2006 18:39:33 -0000
Mailing-List: contact hadoop-dev-h...@lucene.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:hadoop-dev-h...@lucene.apache.org>
List-Unsubscribe: <mailto:hadoop-dev-unsubscr...@lucene.apache.org>
List-Post: <mailto:hadoop-...@lucene.apache.org>
List-Id: <hadoop-dev.lucene.apache.org>
Reply-To: hadoop-...@lucene.apache.org
Delivered-To: mailing list hadoop-...@lucene.apache.org
Delivered-To: moderator for hadoop-...@lucene.apache.org
Received: (qmail 55200 invoked by uid 99); 31 Jan 2006 18:27:54 -0000
X-ASF-Spam-Status: No, hits=0.0 required=10.0
        tests=
X-Spam-Check-By: apache.org
Message-ID: <973704829.1138730132730.javamail.j...@ajax.apache.org>
Date: Tue, 31 Jan 2006 18:55:32 +0100 (CET)
From: "Doug Cutting (JIRA)" <j...@apache.org>
To: hadoop-...@lucene.apache.org
Subject: [jira] Created: (HADOOP-1) initial import of code from Nutch
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Virus-Checked: Checked by ClamAV on apache.org
X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N

initial import of code from Nutch
---------------------------------

         Key: HADOOP-1
         URL: http://issues.apache.org/jira/browse/HADOOP-1
     Project: Hadoop
        Type: Task
    Reporter: Doug Cutting
 Assigned to: Doug Cutting 


The initial code for Hadoop will be copied from Nutch.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Reply via email to