On Thu, 2003-07-24 at 22:01, John Haywood wrote:
> The file in question is a corrupt Microsoft Entourage message file. It is
> 1.8Gig in size (approx). I need to step through it and convert it to an mbox
> format file, by searching for patterns such as :
>
> received: from <name>
> Received: from <name>
>
> and replace these with:
>
> >From <name>
Personally I would do it in Python, but then again that's what I use to
code with :-) If you do it correctly (i.e. don't open the whole file
all at once), you could do it with very little RAM on a pokey PC.
For the change you mention above, you can use this (paste it into a text
file, call it replace.py, then chmod +x replace.py, then run it):
#!/usr/bin/python
import re
r = re.compile('^received: from <', re.I)
f = open('mesg.txt')
fo = open('output.txt', 'w')
while 1:
line = f.readline()
if not line: break
line = r.sub('>From <', line)
fo.write(line)
>
> also, some messages start with
>
> From:
> Return Path:
I don't know what you want to do with these...... I had a brief look at
the webpage you mention below but I think it would be a nice exercise
for you to look at this: (!!)
http://www.amk.ca/python/howto/regex/
*grin*
Please note that with Python, whitespace is significant!
Damon
Want to buy your Pack or Services from MandrakeSoft?
Go to http://www.mandrakestore.com