Re: mhfixmsg character set conversion

Steven Winikoff Tue, 08 Feb 2022 00:55:24 -0800

>I'm unable to replicate your problem here with the original message,
>and using your mhfixmsg invocation, mhfixmsg-format-text/html, and
>locale.  The only piece I think I'm missing is your mime_helper.
>I would give that a try if you send it to me.


I've attached the script, but (without having looked at it in a while) I
suspect it depends too heavily on other parts of my personal setup to be
usable for anyone else.  It turns out not to be relevant, but perhaps it
might be interesting to someone anyway.


>With nmh-1.7 mhfixmsg:
>mhfixmsg: /home/levine/src/nmh/msg part 2, decode text/plain; 
>charset=iso-8859-1
>mhfixmsg: /home/levine/src/nmh/msg part 1, will not decode because it
>is binary (line length > 998)
>mhfixmsg: /home/levine/src/nmh/msg part 2, convert UTF-8 to UTF-8

...and therein lies the answer.

I owe you an apology about this, and I'm sincerely sorry for wasting your
time on this question.

The key is the message about the line length being too long.  Seeing that
reminded me that I'd modified the stock 1.7.1 mhfixmsg with this patch:

   --- uip/mhfixmsg.c.original     2018-03-06 14:05:56.000000000 -0500
   +++ uip/mhfixmsg.c      2019-08-17 19:51:25.723267048 -0400
   @@ -2144,13 +2144,13 @@
                int last_char_was_cr = 0;

                for (i = 0, cp = buffer; i < inbytes; ++i, ++cp) {
   -                if (*cp == '\0'  ||  ++line_len > 998  ||
   +                if (*cp == '\0'  ||  ++line_len > 99998  ||
                        (*cp != '\n'  &&  last_char_was_cr)) {
                        encoding = CE_BINARY;
                        if (*cp == '\0') {
                            *reason = "null character";
   -                    } else if (line_len > 998) {
   -                        *reason = "line length > 998";
   +                    } else if (line_len > 99998) {
   +                        *reason = "line length > 99998";
                        } else if (*cp != '\n'  &&  last_char_was_cr) {
                            *reason = "CR not followed by LF";
                        } else {

I remember asking about the 998-character limit on this list, in a thread
from January 2018.  You explained why the limit exists, and suggested
another way to achieve what I was trying to do, which I tried but without
success -- I wasn't able to get what I wanted without this change, but I no
longer remember the details.

Obviously I need to revisit this question, because I just compiled a copy
of mhfixmsg from 1.7.1 without this patch, and it now behaves as you'd
expect:  it complains about the line length, and then generates correct
output with these headers:

   Content-Type: multipart/alternative;
        boundary=0126698af956ff6e1d4da4d88ae8ef4ebfb0beb8c16cc29b787641a31378
   Content-Transfer-Encoding: 8bit
   
   --0126698af956ff6e1d4da4d88ae8ef4ebfb0beb8c16cc29b787641a31378
   Content-Transfer-Encoding: 8bit
   Content-Type: text/plain; charset="UTF-8"
   Mime-Version: 1.0
   
   --0126698af956ff6e1d4da4d88ae8ef4ebfb0beb8c16cc29b787641a31378
   Content-Transfer-Encoding: quoted-printable
   Content-Type: text/html; charset=iso-8859-1
   Mime-Version: 1.0

With my patch, I get these headers:

   Content-Type: multipart/alternative;
      boundary=0126698af956ff6e1d4da4d88ae8ef4ebfb0beb8c16cc29b787641a31378
   Content-Transfer-Encoding: 8bit

   --0126698af956ff6e1d4da4d88ae8ef4ebfb0beb8c16cc29b787641a31378
   Content-Transfer-Encoding: 8bit
   Content-Type: text/plain; charset="UTF-8"
   Mime-Version: 1.0

   --0126698af956ff6e1d4da4d88ae8ef4ebfb0beb8c16cc29b787641a31378
   Content-Transfer-Encoding: 8bit
   Content-Type: text/html; charset=iso-8859-1
   Mime-Version: 1.0

There's still something going on that I don't understand, however.  The
way I've evaluated the output from mhfixmsg was by viewing it in vim, and
there's no question that the unpatched output looks fine while the patched
output is as I've been describing since the beginning of this thread.

...but when I look at the files with command-line tools such as more or
head, *both* versions look correct.  When I open both files in xed, the
unpatched file is fine, but the patched file generates this message:

   There was a problem opening the file /tmp/nmh_testing/xxx.

   The file you opened has some invalid characters. If you continue editing
   this file you could corrupt this document.

   You can also choose another character encoding and try again.

...with a menu offering "Automatically Detected", "Current Locale (UTF-8)"
and "Western (ISO-8859-15)" as possible character encodings.

In summary, I now know what's happening and (mostly) what to do about it,
but I still don't know why.

     - Steven
-- 
___________________________________________________________________________
Steven Winikoff      |
Montreal, QC, Canada | "I'd love to go out with you, but I'm
[email protected]     |  attending the opening of my garage door."
http://smwonline.ca  |
                     |                           - fortune(6)

#!/bin/sh
#
#  mime_helper -- help MH display MIME attachments
#
#  Steven Winikoff
#  2009/04/17
#  2018/01/13 -- rewrite for nmh-1.7
#
#  This is intended to be invoked from .mh_profile, with entries similar to
#  this one:
#
#     mhshow-show-application: %pmime_helper %F %s "%{name}"
#
#  where the % escapes are interpreted as follows (see mhshow(1) for
#  details):
#
#     %F  exclusive execution, insert filename containing content, and
#         stdin is terminal not content
#
#     %s  Insert content subtype
#
#  for example, here are some sample values of these strings:
#
#         %F = /home/smw/Mail/mhshowk4vGey.pdf
#         %s = pdf
#
#         %F = /home/smw/Mail/mhshowyME4Gs.docx
#         %s = vnd.openxmlformats-officedocument.wordprocessingml.document
#
#         %F = /home/smw/Mail/mhshowxQPJe6.xlsx
#         %s = vnd.openxmlformats-officedocument.spreadsheetml.sheet
#
#         %F = /home/smw/Mail/mhshowpW8QGd
#         %s = vnd.ms-powerpoint
#
#         %F = /home/smw/Mail/mhshowwRBwpH.jpeg
#         %s = jpeg
#
#         %F = /home/smw/Mail/mhshowTs8Yaf.gif
#         %s = gif
#
#         %F = /home/smw/Mail/mhshowtXWO8m.png
#         %s = png
#
#         %F = /home/smw/Mail/mhshowuQrPmF
#         %s = octet-stream
#         [ file reports
#             "Composite Document File V2 Document, Little Endian, Os:
#              Windows, Version 5.1, Code page: 1252, Title: PLAYLIST
#              APRIL 18 ,2009, Author: lou, Template: Normal, Last Saved
#              By: lou, Revision Number: 10, Name of Creating Application:
#              Microsoft Word 8.0, Total Editing Time: 04:07:00, Create
#              Time/Date: Thu Apr 16 16:35:00 2009, Last Saved Time/Date:
#              Thu Apr 16 23:52:00 2009, Number of Pages: 1, Number of
#              Words: 923, Number of Characters: 5264, Security: 0"
#         ]
#
#  note that this script (intentionally) leaves unpacked attachments
#  in /tmp/attachments on the machine where the attachment is opened;
#  these should be cleaned either manually or via cron (eg from
#  /local/misc/daily_cleaner)
#
#--------------------------------------------------------------------------
#  basic setup:

trace=0

ruler="+------------------------------------------"
ruler="${ruler}-----------------------------------"

attach_dir="/tmp/attachments"
user="${USER-smw}"

scp_options="-p -r -B -o ForwardAgent=no -o ForwardX11=no"

decoder="${SMW}/bin/rfc2047decoder"

MH_top=`mhpath +`


#--------------------------------------------------------------------------
#  shell function to try to guess a file's type based on the output
#  of file(1); this is needed only when we don't receive a useful
#  subtype from the message -- and nobody will be surprised to learn
#  that this usually happens with Microsoft software :-/

guess_file_type()
{
   echo echo "$1" | cut -d/ -f2
}


#--------------------------------------------------------------------------
#  are we connected locally or remotely?  ("yes" :-)

this_host=`hostname`

desktop="${ORIGINATING_HOST}"
[ -z "${desktop}" -o  "${desktop}" = ${this_host} ] && desktop="${REMOTEHOST}"

local=0
[ -z "${desktop}" -o  "${desktop}" = ${this_host} ] && local=1


#--------------------------------------------------------------------------
#  grab command line parameters:

case "$1" in
   -i*) #-- special case:  invoked directly by mhread for an HTML message,
        #   bypassing mhshow entirely; in this case, the first argument will
        #   always be either -il or -ia (and in either case, should be passed
        #   on to view_html_message), and the last argument will always be
        #   the full path of the message to be viewed

        imgs="${1}"   # -il or -ia

        for i in $@; do sourcefile="${i}"; done   # last cmd line argument

        subtype="html"
        filename="`echo ${sourcefile} | sed s%${MH_top}/%%\;s%/%_%g`"
        file_output="text/html"
        ;;

     *)  sourcefile="$1"   # mhshow %F
            subtype="$2"   # mhshow %s
           filename="$3"   # mhshow "%{name}"
        file_output=`file --brief --mime-type "$1"`
        ;;
esac


#--------------------------------------------------------------------------
#  strip out all \ characters in the content filename, along with leading
#  and trailing ' characters; also consolidate any remaining runs of
#  ' characters into one single ':

filename="`echo \"${filename}\" | tr -d '\134' | \
           sed -r 's/^'"'"'*//;s/'"'"'*$//;s/'"'"'+/'"'"'/g'`"


#--------------------------------------------------------------------------
#  decode the content filename if appropriate:

prefix=`echo "${filename}" | cut -c1-2`

if [ "${prefix}" = "=?" ]
then
   #-- yes, this filename was encoded in RFC 2047 format; decode it:

    charset=`echo "${filename}" | cut -d'?' -f2 | tr A-Z a-z`
   filename=`echo "${filename}" | ${decoder}`


   #-- and translate to utf-8 if it isn't already:

   if [ "${charset}" != "utf-8" ]
   then
      filename=`echo "${filename}" | iconv -f "${charset}" -t utf-8`
   fi
fi


#--------------------------------------------------------------------------
#  strip out any leading and trailing spaces:

filename="`echo \"${filename}\" | sed 's/^ *//;s/ *$//'`"


#--------------------------------------------------------------------------
#  figure out what type of file we have, and by extension (pun unintended,
#  for a change :-), which application we need to open it:

case "${subtype}" in
          x-awk) subtype="`guess_file_type \"${file_output}\"`" ;;
   octet-stream) subtype="`guess_file_type \"${file_output}\"`" ;;
       download) subtype="`guess_file_type \"${file_output}\"`" ;;
        unknown) subtype="`guess_file_type \"${file_output}\"`" ;;
esac

subtype=`echo "${subtype}" | tr A-Z a-z`

case "${subtype}" in

   *htm*)         app="view_html_message ${imgs}" ;;

   *bmp*)         app="/usr/bin/xviewer -n"       ;;
   *gif*)         app="/usr/bin/xviewer -n"       ;;
   *jpeg*)        app="/usr/bin/xviewer -n"       ;;
   *jpg*)         app="/usr/bin/xviewer -n"       ;;
   *jpeg*)        app="/usr/bin/xviewer -n"       ;;
   *png*)         app="/usr/bin/xviewer -n"       ;;
   *tiff*)        app="/usr/bin/xviewer -n"       ;;
                                                  
   *m4a*)         app="/usr/bin/vlc"              ;;
   *mov*)         app="/usr/bin/vlc"              ;;
   *mp*4*)        app="/usr/bin/vlc"              ;;
   *mp3*)         app="/usr/bin/vlc"              ;;
   *mpeg*)        app="/usr/bin/vlc"              ;;
   *quicktime*)   app="/usr/bin/vlc"              ;;
   *wav*)         app="/usr/bin/vlc"              ;;
   *wmv*)         app="/usr/bin/vlc"              ;;
                                                  
   *pdf*)         app="/usr/bin/atril"            ;;  # was evince, xreader
   *.ps*)         app="/usr/bin/atril"            ;;  # was gv -media=letter
                                                  
   *tnef*)        app="unpack_losemail_dat"       ;;  # was tnef -f
                                                  
   *zip*)         app="file-roller"               ;;
                                                  
   *composite*)   app="soffice"                   ;;
   *csv*)         app="soffice"                   ;;
   *doc*)         app="soffice"                   ;;
   *excel*)       app="soffice"                   ;;
   *icrosoft*)    app="soffice"                   ;;
   *powerpoint*)  app="soffice"                   ;;
   *openxml*)     app="soffice"                   ;;
   *pps*)         app="soffice"                   ;;
   *ppt*)         app="soffice"                   ;;
   *rtf*)         app="soffice"                   ;;
   *vnd.ms*)      app="soffice"                   ;;
   *vsd*)         app="soffice"                   ;;
   *word*)        app="soffice"                   ;;
   *xls*)         app="soffice"                   ;;
                                                  
   *calendar*)    app="calendar_extract"          ;;
   *vCalendar*)   app="calendar_extract"          ;;
   *ics)          app="calendar_extract"          ;;
                                                  
   *txt)          app="more"                      ;;
   *plain)        app="more"                      ;;
                                                  
   *)             app="UNKNOWN"                   ;;
esac


#--------------------------------------------------------------------------
#  trace what's going on:

if [ ${trace} -gt 0 ]
then
   echo
   echo "** \$1 (%F)    = [${sourcefile}]"
   echo "** \$2 (%s)    = [${subtype}]"
   echo "** \$3 (%name) = [${filename}]"
   echo "** \`file\`     = ${file_output}"
   echo "** subtype    = [${subtype}]"
   echo "** app        = ${app}"
   echo
   echo "(see mhshow(1), mhlist(1) and ~/.mh_profile)"
fi

[ ${trace} -gt 1 ] && exit 0


#--------------------------------------------------------------------------
#  bail out if we can't recognize this file type:

if [ "${app}" = "UNKNOWN" ]
then
   echo "${ruler}"
   echo "| unrecognized attachment type; details are as follows:"
   echo "|"
   echo "|    \$1 (%F)    = [$1]"
   echo "|    \$2 (%s)    = [$2]"
   echo "|    \$3 (%name) = [$3]"
   echo "|    subtype     = [${subtype}]"
   echo "|"
   echo "|    file reports:  ${file_output}"
   echo "|"
   echo "| edit $0 to recognize this file type in future"
   echo "${ruler}"
   exit 2
fi


#--------------------------------------------------------------------------
#  finally, open the file:
#
#     text and calendar attachments are viewed in the foreground on the
#     local machine, but all other file types are opened in the background;
#     this allows nmh to continue on its way without having to wait for the
#     user to close the application

if [ "${subtype}" = "txt"  -o  "${subtype}" = "plain" ]
then
   ${app} "${sourcefile}"
   echo
elif [ "${app}" = "calendar_extract" ]
then
   echo "${ruler}"
   ${app} < "${sourcefile}"
else
   #-- we need the basename of the file to copy into the attachments
   #   directory; we also want to add an extension, but only if the
   #   basename doesn't already include one:

   base="${filename}"
   [ -z "${base}" ] && base="`basename \"${sourcefile}\"`"
   nosuffix=`echo "${base}" | sed 's/\..*//'`
   [ "${base}" = "${nosuffix}" ] && base="${base}.${subtype}"


   #-- construct the full pathname to which the attachment will be copied:

   target="${attach_dir}/${base}"


   #-- are we displaying this attachment locally?

   if [ ${local} -gt 0 ]
   then
      #-- yes, we're on the local desktop machine; the required helper
      #   application can run directly:

      echo
      echo "${ruler}"
      echo "| copying ${sourcefile}"
      echo "|      to ${target}"
      cp -p "${sourcefile}" "${target}"

      echo "|"
      echo "| opening ${target}"
      echo "| with    ${app}"
      echo "${ruler}"
      ${app} "${target}" > /dev/null 2>&1 &
   else
      #-- we're in an ssh session from a remote desktop machine, so first
      #   we have to copy the file to the remote desktop, then start the
      #   application there via ssh:

      echo
      echo "${ruler}"
      echo "| copying ${sourcefile}"
      echo "|      to ${desktop}:${target}"
      scp ${scp_options} "${sourcefile}" \
          ${user}@${desktop}:"\"${target}\""

      echo "|"
      echo "| opening ${target} remotely"
      echo "| with    ${app}"
      echo "${ruler}"

      ssh -nY -l ${user} ${desktop} \
          "setenv DISPLAY :0;${app} \"${target}\"" > /dev/null 2>&1 &
   fi
fi

Re: mhfixmsg character set conversion

Reply via email to