On Fri, Aug 12, 2022 at 7:35 AM Storkman <[email protected]> wrote: > > On Wed, Aug 10, 2022 at 09:29:43PM +0200, Thomas Oltmann wrote: > > Hi all! > > > > I think we can all agree that the current web archive over at > > lists.suckless.org isn't all that great; > > Author names get mangled, the navigation is terrible, some messages > > are duplicated, some missing. > > > > That's why I've started looking into #3 of the 'Project Ideas' page > > (https://suckless.org/project_ideas/) -- "Write a decent mailing list > > Web archive system". > > I see lots of potential to build something better than hypermail: > > > > - We could take text encodings more seriously. > > hypermail just copies the 'charset' notice over into the HTML > > file, which doesn't work when listing multiple messages. > > > > - We could use maildir instead of the really brittle mbox format for > > mailboxes. > > This might also help avoid message dropping/duplication, but I'm not > > sure about that. > > > > - We could try a different navigation scheme. Perhaps flat threads > > instead of a hierarchy? > > I don't really know how people here feel about this, but it's > > mentioned on the 'Project Ideas' page > > and I'm in favour of it. Navigating message trees is really confusing. > > > > - Bonus: We can ignore CGI, uuencode, HTML mail and all that cruft. > > > > Is there currently any interest in such a project here? > > > > So far, I've gone ahead and implemented a sort of proof-of-concept (at > > https://www.github.com/tomolt/mailarchiver). > > Of course I can't guarantee that this will go anywhere, as I only have > > limited time and patience myself, but I can give it a try. > > > > Cheers, > > Thomas Oltmann > > > > Hi! > > When you list all these features, it sounds like everything a mailing list > archive front-end does just replicates things our mail clients already > do better, and without going through a web browser. > > So I thought, why not just serve the maildir files as-is, with monthly > and yearly tarballs, and perhaps metadata files so you don't need to > download everything just to make sure you've got an entire thread? > But then, that would require additional instrumentation and would make e.g. > referencing mailing list threads in commit messages slightly less convenient.
There's some overlap in functionality with mail clients, yes, but the big difference IMO is that a mail archiver aggregates the mail traffic and turns it into proper *documents* that can easily be _viewed_, _distributed_ *and* _referenced_ by anyone. It doesn't matter what kind of format these documents actually are - HTML, plain text, PDF, whatever. For example, if a newbie asks "Help, I can't apply the dwm-alpha patch" you just want to be able to give him a link to the last time this was answered. Similarly, when you write a blog entry referencing recent discussions on the mailing list, having some link or document that you can put in your references is great. But aside from that, additionally distributing a tarball might be a really good idea for long-term archival. tar and RFC822 have been here for 50 years and will likely stay for another 100. > In any case, I messed with the code a bit, running it on my own archive > maildir. I've constructed a very crude threaded view[1], and came up with a > few fixes in the process. > > Patch 2 is a rewrite of collapse_ws(), because I found it really hard to > figure out what exactly it does and how. Your mileage may vary, but I > think the original code would overflow the buffer backwards when given > an empty input. > > For patch 3, I've found some e-mails in the wild that used a lowercase > encoding in encoded-words, and the RFC says it's okay. > > Patch 4 might not be correct, because I'm not sure how decode_qprintable() > can ever return without error when parsing an encoded-word in a header. > It seems that it would just find the last "=" in "?=", set length to -2, > and return NULL. Maybe I'm just not getting it. It did manage to process > a few dozen more e-mails in my test runs, though. > > Hopefully I did this correctly and you can cherry-pick these commits > to your taste. Thanks a bunch. I'll take a closer look at your patches when I find some time. > > -- Storkman > > [1]: https://imgur.com/a/EbOblHt
