Jeff,

Thanks again--more things are clear now. But your response raises more questions in my mind as well. Please bear with me, we're almost there I think.

Keeping in mind that I am splitting Digests into individual messages, I have to fake whatever headers are not already there within the individual messages. In the case of my digests, the existing headers are only:

Date:
From:
Subject:

To this I add a ^From_ line just before the date line to make it an mbox format (this is taken from the Digest header and copied in front of each email in the digest, so it's identical for several emails.

An example of this ^From_ line is this:
        From - Mon Nov 28 11:00:05 2005

Now, with this background, here are my further questions:

1. I thought I would have to add a "To:" line to my "faked" headers, but you are saying that it is never used, is it OK if the To: line is completely absent?

The only things indexed for search are: message-id,

2. Will it choke if there is no Message-ID? I know this is used in threading but in an earlier email you said that in the absence of Message-ID it would thread on Subject, but just want to make sure it won't reject the email without a Message-ID field.

... subject  ... sender name (extracted from From: header)

No issue on these.

... date (usually extracted from the Recieved: header)

3. Hmm, again, there is no Received header. Will it properly take the date from the Date: field in that case and not choke on the absence of any Received headers? The date field format (from an actual example) is:
        Date:    Mon, 28 Nov 2005 17:55:14 +1100

... posting address (for example, gossip@mail-archive.com ... Every message is
> sorted and organized according to posting address ... but the To: header
> is never indexed for search, never used during import, and there is no
> benefit for you to adjust it.

4. This is where I'm most confused now. Where *is* the posting address extracted from if not from the To: header? Is it an internal field in your archive message database that is (a) predetermined manually in an import, (b) mapped to a fixed internally stored name for new incoming email (based on headers including To:) and nothing else? In your earlier response, where I had asked about the varying forms of To: addresses in my old archives that needed to be imported (e.g., tang...@mit.edu, tang...@mitvma.mit.edu, ta...@mitvma.mit.edu) in terms of confusing search (since I incorrectly imagined that the To: line would be looked for in the search), you had replied:

Search will have no concept of alternative list names. There is no reasonable 
way to overcome this.

but now you say search never looks at the To: lines and they aren't used in imports either. So in light of your latest response I don't understand now why search would have an issue of "alternative list names"--they are alternative To: lines but the same list, and the variations exist only in archives--new email would have a consistent To: line reflecting the current posting address.

A merged archive will have the same posting
address for every message, with no memory about what life was like
before the merge.

OK, this is consistent with the "To:" line never being used for search. So the "l=" parameter in the search would always have to be the new list name following a merge, correct? That shouldn't be an issue.

Shahrukh

_______________________________________________
Gossip mailing list
https://www.mail-archive.com/gossip@mail-archive.com
https://www.mail-archive.com/cgi-bin/mailman/options/gossip

Reply via email to