Re: [Mailman-Developers] Killing off Pipermail and the effects on scrubbing in Mailman 3

2012-03-17 Thread Barry Warsaw
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On Mar 16, 2012, at 10:11 AM, Mark Sapiro wrote:

There are two things going on. There is content filtering, i.e.,
removal from the message of parts with unwanted MIME types or filename
extensions. These parts are simply removed by pipeline/mime_delete.py
(which probably needs some changes ported from 2.1, aargh...).

Yeah, that's embarrassing ;).  I've started down the road of adding unittests
for the code in that module.  You'll see the start of that land momentarily.

Then there is what pipeline/scrubber.py does with the remaining
message which is remove those message parts which can't be rendered
well in a flat, text/plain message and store them aside and replace
them by links in the message. The part we can't do in MM 3 is
calculate a URL to display/download them.

Yep.

 The easiest thing to do, and what I will probably do in my 
 'death-to-pipermail' branch is to simply scrub out the unwanted
 parts *after* a copy of the message is sent to the archive queue,
 but *before* the message is sent to the digest, usenet, and
 outgoing queues.

I'm not sure about the *before* with respect to usenet and digest and
certainly outgoing. Currently in 2.1, we don't scrub (as opposed to
content filter) non-digest deliveries unless scrub_nondigest is Yes.
We maybe should just drop that option.

We also don't scrub messages for the MIME digest.

I also don't think we scrub messages destined for usenet. I think we
let usenet worry about that in the same way we propose to let whatever
archiver is configured worry about it.

I don't see a need to handle these differently in MM 3.

ISTM that essence of the scrubber is to turn any remaining text/html parts
into plain text, by various means.  I think the MM2 scrubber.py module is
essentially hopeless, but the basic functionality is useful.  I've decided to
remove the scrubber in the Pipermail-eradication branch, which will also land
momentarily.  I think it would be useful though to rewrite the scrubber, boil
it down to its essential functionality, and add that to the appropriate spot
in the pipeline.

How would you like to take a crack at that?

 For now, I'm going to try to implement sending an unscrubbed copy
 of the message to the archivers and just throwing up our hands for
 the copy of the message sent to the list members.  The nice
 side-effect of this is that it makes the scrubber *way* simpler!

Perhaps we could keep the scrubber as is except for modifying it to
not store scrubbed parts and put some kind of apology in the message
rather than the link to the no longer stored content.

Then my lp:~msapiro/mailman/scrubber-fix branch would still be relevant ;)

Yeah, sorry about that. ;)  I think scrubber.py was just too nasty to
salvage.  Something much simpler would still be useful though.

Cheers,
- -Barry
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iQIcBAEBCAAGBQJPZMQ9AAoJEBJutWOnSwa/nFEP/iMDCM+ETv1KV36nP8r/cZfB
C50m+K1MUm/MaZpkpQI8980J96QWC1RoWvQ7sQGg2difvDvNwI0JZP4gMBJkHVUu
sO/hJZu0BDa28cC9Ww94fRX4ujelm/jesc8td0v02s54FSHUIOgxxDr+sfWNFPvI
OpLDJZVtC6LJbDt1IqI2ozxbq/b3hhuaXDbmzIsWqotyZZ/+fQDjgM4L9SCEjhrT
tDwQjFhsZmH3m58pFRkP/cOJCV2lKs0MnMGMhELHGkatMGKtVFAuP1e3r24N20yX
EVDX/7Dg20BzacNYnAVGnO28sYqb4JltRAb14+IvIMcRzIO+WKKAyJioKX3cohcT
14fhb0agtDPlMMBJw8J5AD9VEimMcZaMmISLpRY6jqkaHRu/4RxZlG3RRWtcBwdS
dN0WZnnNx6B+wV5VUJ7Q5WaDO1Xtp0jGHuT96vOQlHDm/+iwwmWWvGH3DQg1yVDN
gT2/JyLeXpDprP+qXNPLyWlMlADQjUCq7uvD51J0gcCC6aLanPnM9CuCQXdJRlFl
7g+zI9a17qCdniQcbNUgq+87ektXLi7JCp6nA1yEm0Zaelp3wJC2cB7up9ZaVR7O
SX8qpMFnfqFkvsQLC2pLH7plplHpboXWOjLALITFBzasth4hS98oHH+gOJktTKni
Erk1f+FsVR9l0Geu2q++
=z6a7
-END PGP SIGNATURE-
___
Mailman-Developers mailing list
Mailman-Developers@python.org
http://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Developers] Grackle archive framework

2012-03-17 Thread Aamir Khan
On Fri, Feb 17, 2012 at 12:55 AM, Barry Warsaw ba...@list.org wrote:

 On Feb 16, 2012, at 11:40 PM, Aamir Khan wrote:

 I talked to people on #launchpad-dev, as suggested by barry to investigate
 about the possibility of using grackle as new archive framework for
 mailman. The project isn't functional yet. There are two parts, client and
 back-end service to store the messages. Client will hopefully be completed
 by end of next week.
 
 For backend-service implementation, i am also thinking to get involved
 during its development. They are planning to have cassandra based store
 for
 storing the messages. Any thoughts about which backend service would be
 ideal to store messages for mailman installations ?

 On IRC, we talked about a storm + Python mailbox library based backend,
 with a
 REST+JSON wsgi based application vending the data.  This would allow us to
 integrate fairly easily with MM3 I think, and would possibly better enable
 some of the archiver work being done by Terri and others.


I understand that we will store the messages in .mbox format. But I don't
understand why do we need to use storm for the archiving purpose.


 -Barry
 ___
 Mailman-Developers mailing list
 Mailman-Developers@python.org
 http://mail.python.org/mailman/listinfo/mailman-developers
 Mailman FAQ: http://wiki.list.org/x/AgA3
 Searchable Archives:
 http://www.mail-archive.com/mailman-developers%40python.org/
 Unsubscribe:
 http://mail.python.org/mailman/options/mailman-developers/syst3m.w0rm%40gmail.com

 Security Policy: http://wiki.list.org/x/QIA9




-- 
Aamir Khan | 3rd Year  | Computer Science  Engineering | IIT Roorkee
___
Mailman-Developers mailing list
Mailman-Developers@python.org
http://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Developers] Grackle archive framework

2012-03-17 Thread Barry Warsaw
On Mar 18, 2012, at 12:23 AM, Aamir Khan wrote:

On Fri, Feb 17, 2012 at 12:55 AM, Barry Warsaw ba...@list.org wrote:
 On IRC, we talked about a storm + Python mailbox library based backend,
 with a
 REST+JSON wsgi based application vending the data.  This would allow us to
 integrate fairly easily with MM3 I think, and would possibly better enable
 some of the archiver work being done by Terri and others.


I understand that we will store the messages in .mbox format. But I don't
understand why do we need to use storm for the archiving purpose.

I meant to say maildir.  Please let's not use mbox format!  It's way too
easy to corrupt the file, as we did with a bug once in MM2.1, and we've paid
the price ever since.

As for archiving, it isn't strictly necessary to use storm, it's just a nice
lightweight ORM I happen to like.  But I think it *does* make sense to back a
full-fledged archiver with a database and a full-text search engine.  For
example, using our RFC 5064+X-Message-ID-Hash scheme, the database would
handle the lookup from hash to actual message storage location.

Cheers,
-Barry


signature.asc
Description: PGP signature
___
Mailman-Developers mailing list
Mailman-Developers@python.org
http://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9

Re: [Mailman-Developers] Killing off Pipermail and the effects on scrubbing in Mailman 3

2012-03-17 Thread Mark Sapiro
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 3/17/2012 10:05 AM, Barry Warsaw wrote:
 
 ISTM that essence of the scrubber is to turn any remaining
 text/html parts into plain text, by various means.  I think the MM2
 scrubber.py module is essentially hopeless, but the basic
 functionality is useful.  I've decided to remove the scrubber in
 the Pipermail-eradication branch, which will also land momentarily.
 I think it would be useful though to rewrite the scrubber, boil it
 down to its essential functionality, and add that to the
 appropriate spot in the pipeline.
 
 How would you like to take a crack at that?


Sure. Now that I actually have a bit of an idea of what's going on in
the MM 3 core, I'm happy to give it a go.

Next step for me may be to learn more about how the unittests fit
within their framework so I can create some.

Also, I need to figure out a better development platform for Windows
boxes. I had a perfect opportunity to scrap Windows all together when
I had to recover from a hard drive crash on my main development box
last year, but the dice fell the wrong way on that one.

Anyway, Cygwin is not going to cut it for MM 3. At the sprint, I tried
installing MM 3 in a vagrant VM, but there was too much missing (e.g.,
no Python.h) and even 'apt-get install python-dev' didn't fix that. I
ended up working remotely inside a virtualenv on my production server.
That actually seems to work OK, but I'm afraid that as I get deeper
into it, there will be things I need to do that I won't want to do on
my production box.

Anyway, if anyone has any suggestions for me besides the obvious bite
the bullet now and scrap Windows - it will only be worse later, I'm
interested. I suppose I could always dual-boot Windows and some Linux
side by side.

Maybe I can organize a sprint at the next PyCon - Migrating to Linux
and killing Windows one PC at a time.

- -- 
Mark Sapiro m...@msapiro.netThe highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (MingW32)

iD8DBQFPZSGEVVuXXpU7hpMRAnRAAKDSSRUhDdQe8HoIBzOh3coe8elMIQCfU+dP
fKbzWiMB+H1wm4Jou28BV7g=
=Ehz5
-END PGP SIGNATURE-
___
Mailman-Developers mailing list
Mailman-Developers@python.org
http://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Developers] Killing off Pipermail and the effects on scrubbing in Mailman 3

2012-03-17 Thread Andrea Crotti

On 03/17/2012 11:43 PM, Mark Sapiro wrote:


Also, I need to figure out a better development platform for Windows
boxes. I had a perfect opportunity to scrap Windows all together when
I had to recover from a hard drive crash on my main development box
last year, but the dice fell the wrong way on that one.

Anyway, Cygwin is not going to cut it for MM 3. At the sprint, I tried
installing MM 3 in a vagrant VM, but there was too much missing (e.g.,
no Python.h) and even 'apt-get install python-dev' didn't fix that. I
ended up working remotely inside a virtualenv on my production server.
That actually seems to work OK, but I'm afraid that as I get deeper
into it, there will be things I need to do that I won't want to do on
my production box.

Anyway, if anyone has any suggestions for me besides the obvious bite
the bullet now and scrap Windows - it will only be worse later, I'm
interested. I suppose I could always dual-boot Windows and some Linux
side by side.

Maybe I can organize a sprint at the next PyCon - Migrating to Linux
and killing Windows one PC at a time.





I would definitively suggest to get rid of Windows, but if you don't 
want or can't
do the big step there is no reason to do a dual-boot, a just create a 
Linux virtual

machine and you're done :)

Unless you don't have much RAM is the best solution, and there are even 
pre-made

virtual machines here:
http://virtualboximages.com/

so it doesn't really take long to set up something working :)
___
Mailman-Developers mailing list
Mailman-Developers@python.org
http://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Developers] Grackle archive framework

2012-03-17 Thread Aamir Khan
On Sun, Mar 18, 2012 at 4:24 AM, Barry Warsaw ba...@list.org wrote:

 On Mar 18, 2012, at 12:23 AM, Aamir Khan wrote:

 On Fri, Feb 17, 2012 at 12:55 AM, Barry Warsaw ba...@list.org wrote:
  On IRC, we talked about a storm + Python mailbox library based backend,
  with a
  REST+JSON wsgi based application vending the data.  This would allow us
 to
  integrate fairly easily with MM3 I think, and would possibly better
 enable
  some of the archiver work being done by Terri and others.
 
 
 I understand that we will store the messages in .mbox format. But I don't
 understand why do we need to use storm for the archiving purpose.

 I meant to say maildir.  Please let's not use mbox format!  It's way too
 easy to corrupt the file, as we did with a bug once in MM2.1, and we've
 paid
 the price ever since.


I read the difference between maildir and mbox format and it clearly states
that mbox is prone to corruption while maildir is not. Also there are more
advantages using maildir in a way that there is no file locking problem.
But since we will be storing each mail in a separate file, searching
through them will not as fast enough. Using database alone also have
problems like, it will use more hard disk, more CPU cycles will be consumed.

So, if we can store the messages in maildir format with a copy of it it
database. we can serve the searching request using database query which
will powered by full-text search engine. But then there will be problems of
synchronization between the maildir messages and  messages stored in
database. What are your thoughts about it ?

As for searching the archive, there are solutions like Elastic Search,
Solr, lucene. Can we use one of them to search directly through the maildir.


 As for archiving, it isn't strictly necessary to use storm, it's just a
 nice
 lightweight ORM I happen to like.  But I think it *does* make sense to
 back a
 full-fledged archiver with a database and a full-text search engine.  For
 example, using our RFC 5064+X-Message-ID-Hash scheme, the database would
 handle the lookup from hash to actual message storage location.

 Cheers,
 -Barry




-- 
Aamir Khan | 3rd Year  | Computer Science  Engineering | IIT Roorkee
___
Mailman-Developers mailing list
Mailman-Developers@python.org
http://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9