On 04/13/2015 12:57 AM, Shahrukh Merchant wrote:
I have two discussion lists on the Argentine Tango that are probably
going to be suspended going forward owing to lack of activity in the
face of many competing technologies in recent years, but that have a
treasure of information dating back from 1994.
Sounds very cool.
I would like to get these onto mail-archive but there are some
peculiarities of the existing archives that I have some questions on.
Here are the questions:
1. First the easy one. From April 2006 to the present, the lists were
hosted using mailman, so I have the complete raw mailman archives that
I've downloaded. They are in one big mbox-format file (about 50 MB).
(a) This is I suppose the most straightforward since I just send a
pointer to these files and the mail-archive staff will do the rest,
correct? (b) And am I correct that the single file is the best (rather
than the monthly gzipped files? And (c) that the mail-archive software
will recreate threads as necessary?
In my experience, both mbox and monthly gzipped files work fine. Threads
were recreated fine in both cases.
2. Now the harder one. From Sep 1994 (inception) to Apr 2006, the
lists were hosted using L-Soft's LISTSERV software, which did not keep
archives. However, I have a complete set of all traffic from that time
period, but they are all in Daily Digest format, i.e., with a "Table
of Contents" in the front and several emails afterwards. I have MOST
(but not all) of these available as MIME digests with each message in
a different MIME multipart segment. I also have ALL of them available
as a non-MIME digest, with a fixed text separator (like a row of ----)
between messages. I would propose to send these as an mbox format of
digest files but each email in each digest message would still need to
be separated out. (a) Can mail-archive do this digest parsing, or do I
need to find or write a script to do this myself? (b) If mail-archive
can do it, do you have a preference for MIME vs. non-MIME digest? (c)
And if MIME, can you handle the few for which I only have non-MIME
digests?
I can't help with this one; skipping.
3. Must these old archives be processed by mail-archive in
chronological order in order for threading to work properly? Or if I
provide older ones later are they automatically inserted and
rethreaded appropriately?
They will be inserted and rethreaded appropriately.
4. The FAQ says that only the latest 3000 messages are kept live and
the rest are in "cold storage" and can be retrieved only via matching
searches. Some questions on this: (a) Are the "latest" based on when
they were processed by the archive software (e.g., old archives
processed recently would count as new)? Or (b) Are the "latest" based
on the Date: field of the post in question?
(b) is correct.
(c) Is there any way to get ALL messages live on mail-archive rather
than only 3000 so they can be browsed for by month and year for
example (e.g., by requesting an exception considering the list will be
mothballed and won't be expanding, or by paying a donation/fee)? There
is about 100 MB total of data per list, I'd guess.
Mail-archive support, when I asked about the 3000, was willing to talk
about exceptions. I didn't pursue it so I can't say more in detail.
(d) If not, is there a way I can get a full mirror download that
include the "cold storage" older archives (after processing by
mail-archive's scripts) for me to install live on my own server (which
may or may not disappear) while mail-archive still keeps it more
permanently in their live+cold way?
I have to imagine that some tricks with wget or httrack should be able
to do this, despite the "cold storage" aspect, but I'm only guessing. I
would pursue the first question with mail-archive support and see what
happens.
_______________________________________________
Gossip mailing list
https://www.mail-archive.com/gossip@mail-archive.com
https://www.mail-archive.com/cgi-bin/mailman/options/gossip