Jeff,
Thanks again--more things are clear now. But your response raises more
questions in my mind as well. Please bear with me, we're almost there I
think.
Keeping in mind that I am splitting Digests into individual messages, I
have to fake whatever headers are not already there within the
individual messages. In the case of my digests, the existing headers are
only:
Date:
From:
Subject:
To this I add a ^From_ line just before the date line to make it an mbox
format (this is taken from the Digest header and copied in front of each
email in the digest, so it's identical for several emails.
An example of this ^From_ line is this:
From - Mon Nov 28 11:00:05 2005
Now, with this background, here are my further questions:
1. I thought I would have to add a "To:" line to my "faked" headers, but
you are saying that it is never used, is it OK if the To: line is
completely absent?
The only things indexed for search are: message-id,
2. Will it choke if there is no Message-ID? I know this is used in
threading but in an earlier email you said that in the absence of
Message-ID it would thread on Subject, but just want to make sure it
won't reject the email without a Message-ID field.
... subject ... sender name (extracted from From: header)
No issue on these.
... date (usually extracted from the Recieved: header)
3. Hmm, again, there is no Received header. Will it properly take the
date from the Date: field in that case and not choke on the absence of
any Received headers? The date field format (from an actual example) is:
Date: Mon, 28 Nov 2005 17:55:14 +1100
... posting address (for example, gossip@mail-archive.com ... Every message is
> sorted and organized according to posting address ... but the To: header
> is never indexed for search, never used during import, and there is no
> benefit for you to adjust it.
4. This is where I'm most confused now. Where *is* the posting address
extracted from if not from the To: header? Is it an internal field in
your archive message database that is (a) predetermined manually in an
import, (b) mapped to a fixed internally stored name for new incoming
email (based on headers including To:) and nothing else? In your earlier
response, where I had asked about the varying forms of To: addresses in
my old archives that needed to be imported (e.g., tang...@mit.edu,
tang...@mitvma.mit.edu, ta...@mitvma.mit.edu) in terms of confusing
search (since I incorrectly imagined that the To: line would be looked
for in the search), you had replied:
Search will have no concept of alternative list names. There is no reasonable
way to overcome this.
but now you say search never looks at the To: lines and they aren't used
in imports either. So in light of your latest response I don't
understand now why search would have an issue of "alternative list
names"--they are alternative To: lines but the same list, and the
variations exist only in archives--new email would have a consistent To:
line reflecting the current posting address.
A merged archive will have the same posting
address for every message, with no memory about what life was like
before the merge.
OK, this is consistent with the "To:" line never being used for search.
So the "l=" parameter in the search would always have to be the new list
name following a merge, correct? That shouldn't be an issue.
Shahrukh
_______________________________________________
Gossip mailing list
https://www.mail-archive.com/gossip@mail-archive.com
https://www.mail-archive.com/cgi-bin/mailman/options/gossip