Just as a FWIW, I'm at ApacheCon at the moment, so my time for PM is
limited. I'll reply in length next week when I get back. On the whole, I
agree with your thoughts.

With regards,
Daniel.

On 11/18/2016 12:28 PM, sebb wrote:
> At present if the archiver cannot extract what it considers
> displayable body text from the input, it may drop the message
> entirely.
> 
> For example, this currently happens for valid messages such as
> HTML-only (Nexus) and for signature-only messages [1], [2]. (*)
> 
> This does not make sense for an archiver.
> 
> At the very least it should index the raw source, and put some kind of
> marker in the summary record to show that it could not understand the
> message structure.
> 
> I don't think it would be good to store the message in the body itself.
> That would mess up searches and statistics.
> 
> Rather than add a separate flag, it occurs to me that this would be a
> good use for storing the body as 'null'.
> 
> It's not possible for a real message to have a null body - it may be
> empty, but it cannot be null.
> 
> The GUI could then be fixed to display a standard message explaining
> that the message cannot be displayed, as is done by mod_mbox. At least
> then readers can look at the source itself.
> 
> Also, when the parser is improved to deal with more message layouts,
> it would be easy to find such emails and re-index them.
> 
> Does that make sense?
> 
> [1] 
> http://mail-archives.apache.org/mod_mbox/httpd-users/201212.mbox/%3Ce9bd8c2b31947867d3bf1174d27b8e01%40mail.gmail.com%3E
> 
> [2] 
> http://mail-archives.apache.org/mod_mbox/ofbiz-dev/201505.mbox/%3C26E553E8-7C7F-4CB3-A553-EE7487E2BC9C%40ecomify.de%3E
> 
> (*) HTML-only messages are currently dropped if html2text is not available
> Sig-only messages are dropped because the code only checks multipart
> messages for attachments.
> Both of these can be fixed, now that they are known. But other issues
> may arise in future.
> 

Reply via email to