On 6/1/11 12:44 AM, William Ashworth wrote:
I've searched for a bit and perhaps nothing is available (or I'm typing the 
wrong searches).

A client of mine is looking to integrate more tightly to their Mailman list. 
There's an archive page, but we're trying to format it nicely for inclusion on 
their website so that it matches for members to see. I can see two 
possibilities right now...

1.  We write a custom PHP application to scrape and store the archive pages 
into a database to call later, however we want.

2.  We create an email address and subscribe it to the list. Any new messages 
will be checked via a PHP script we build and stored in the database, then 
we'll pull from our own archive format however we choose.

The only problem with #2 is that we lose 8 years of legacy emails that are 
already present in the archives. My best bet is to find some way to hook into 
the archives with PHP so that we can roll it into the rest of their complicated 
website. Looking at another development language other than PHP at this time 
would be a conflict of interest with the rest of their website applications.

I may be dreaming, but if there's some way to nightly export the data to XML 
from the archive or something, which we can then (also) nightly import that XML 
data into a MySQL database, then the sky's the limit...I simply don't know if 
there's a standardized way to access the archived information, as scraping is 
very messy.

I'm completely new to Mailman. Any assistance you can offer to help get my 
bearings straight would be greatly appreciated.
I have done a bit of both for some applications. For #1, I use the mbox file instead of scraping the pages though. This result is then put into a database, and then new messages are added as things go. Since mbox is very close to the mail format, you can use a lot of common code.

--
Richard Damon

------------------------------------------------------
Mailman-Users mailing list Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Reply via email to