On 12/7/2014 10:35 AM, Andy Bradford wrote:
.... I downloaded  your zip file and  looked at the
files  and discovered  that the  last few  bytes of  each file  has some
control characters (0x1a,  0x1d), null characters (0x00) and  one has an
extended ASCII character 0xe6.

$ od -x 9s08gw32.s8p | tail -3
0004400     3331    0a0d    3953    3330    3030    3030    4346    0a0d
0004420     1d1a    0000
0004423

$ od -x 9s08rt8.s8p  | tail -3
0003020     3532    0d39    530a    3039    3033    3030    4630    0d43
0003040     1a0a    00e6
0003044

Those files look like device configurations for P&E tools for writing to FLASH memories connected to (or embedded in) specific embedded CPUs. I used P&E's tools with various ColdFire processors on past projects, and had to create my own configuration file for their FLASH programmer to correctly handle one of our projects. IIRC, the files end with a Ctrl+Z and a (badly chosen) checksum.

Ctrl+Z in a DOS (really CPM) heritage file is treated as an end-of-file mark. CPM and some early versions of MSDOS did not have byte-accurate file sizes, so files were read a block at a time and Ctrl+Z was the signal to mark the end of the text content. Many DOS commands respected that, including TYPE. As a result, Ctrl+Z is occasionally used to separate pure text content from binary content in a hybrid file. That legacy is present today in Windows, where the C runtime library treats a Ctrl+Z as EOF if the file is opened in text mode.

The real problem  is that once a  file is treated as binary  I can not
`diff' it between versions.

Yes, that  would be problematic.  I wonder  if a better  heuristic could
be  implemented. If  some  percentage  of the  file  is ASCII  printable
characters maybe it could be treated as non-binary?

I wonder how hard it would be for the diff implementation to completely ignore the ASCII/binary question and attempt the diff as if it were a rational text file, leaving it as an issue for presentation only. If It could simply fail over to a binary comparison if any assumptions get violated such as an assumption of line length.

For presentation, the non-printable bytes could either be printed anyway verbatim (consistent with some unknown code page assumption) or translated to a "safe" replacement character.

This would handle these files rationally since the bulk of the files actually have 80-column or shorter CRLF delimited lines, and only leave a question of how to display the matching 1A and 00 bytes and the differing checksum bytes.

--
Ross Berteig                               r...@cheshireeng.com
Cheshire Engineering Corp.           http://www.CheshireEng.com/

_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Reply via email to