On 03/03/2011 09:43 AM, Clayton Keller wrote:
Mark,

I have been struggling with handling some pesky encoded characters in
mail logs as of late.

My issue involves passing the message on to a remote rsyslog server
which then processes the messages into a database. From time to time I
see messages with something similar to the following:

"Subject:<RE: VIAGRA \256 Official Site ID031831740>"

In my case, messages are passed to processing mechanisms via triggers
prior to a final insertion but I am getting DB errors based on the
handling of these "invalid" characters.

I'm not sure how you are getting an 8-bit character out of the log.
Amavisd logging encodes all nonprintable characters as \nnn or \{xxxx},
so syslog should not be seeing any. If your syslog daemon decodes that,
it must not assume the decoded string will be a valid string in some
encoding such as UTF-8, as the sender can put any junk in his subject
or display name, and need not play by the rules.


Good points here as well. This is stock syslog on CentOS 5. I will look
into this area as well.

I don't think the use of mime2utf8 will help me in these types of
instances.

I think the mime2utf8 comes closest to what can be achieved with
decoding of text in Subject and From header fields. Anyway, the
mime2utf8 should be able to always produce a valid UTF-8 -encoded text
(although not necessarily printable), which is further protected by turning
nonprintable characters to \nnn by the log writing code.

If logging get further processing, you may prefer [:b64encode[:mime2utf8...
instead of [:dquote|[:mime2utf8...  in the log template.


This idea came to my mind last night too. I am going to do some testing
of live data with it and see what results it provides as well.

Also, after looking through previous mailing list messages
I've seen references to somewhat similar types of issues with the use of
setting the LANG= value with the invocation of amavisd-new. I may be not
thinking this through clearly, but I don't think that would help much in
this case since everything is happening after the fact on a completely
different system.

That's unrelated. Setting locale to "C" on a mailer is still a sensible
thing to do in principle.

Would I be better off in the log template doing some type of find and
replace regex on the fields to help escape these characters? Is that
even possible? Typically I'd do this in the code that is called by the
trigger but I'm not quite getting that far along in the processing yet
to do that. My thinking at this point is to escape any backslashes
before I write the log and then hopefully I can handle the escaped
characters elsewhere.

Are you replacing the write_log() with your own code?


Not at this time no.

I see the following noted in the README.customize file:

"If assigning to variables, care must be taken to properly quote certain
special characters (like backslash), as required by Perl quoting rules.
Text read from amavisd file or from external files is not subject
to Perl quoting rules."

But what is the best practice to do so within the templates and the
available macros? Or is it more raw perl within the template that I need
to consider?

No interpretation/decoding of characters in templates is done by
amavisd - it uses the text as provided in a variable. How you put it
into a variable depends on your config file: assigning a "..." or '...'
enclosed text to a variable is subject to Perl's interpretation of
qq() and q() or a 'here-document'. Or you may be reading the
text from a file, in which no interpretation occurs.

After additional troubleshooting and trial and error of my issue, it is
looking more like I need to take a more simple approach of a find and
replace mentality to the content passed to amavisd-new.

Looking through README.customize I don't see any predefined macros which
do such a thing thus far, and am considering a custom macro to do just
this.

I wanted to throw this out to the list to make sure I'm not overlooking
something obvious or already available.

I can adapt the macro mime2utf8 or similar if you can explain
what is it that it is supposed to do in place of its current function.


I'm not sure that is necessary at this time, but was something I was
toying around with possibly doing here as well. Again, I am going to
allow for more time to test on my end of things. Unfortunately, I can't
quite answer for you what exactly tell you what I am wanting it to do in
addition just yet. Hopefully I can better answer that as I get more data.

Thanks again for your response. After I do more troubleshooting/testing
on my side I will report any findings that may be beneficial to the
discussion.

Clay



Just a follow-up on this thread. Mark, utilizing mime2utf8 along with some better and "earlier" decoding to unicode and better handling escaping of backslashes on my end, my testing is showing that I am able to handle the logging oddities that we have been seeing much better than before and writing to PostgreSQL with far less issues.

Thanks again for your time and ideas.

Clay

Reply via email to