On 12/7/2014 10:35 AM, Andy Bradford wrote:
.... I downloaded your zip file and looked at the
files and discovered that the last few bytes of each file has some
control characters (0x1a, 0x1d), null characters (0x00) and one has an
extended ASCII character 0xe6.
$ od -x 9s08gw32.s8p | tail -3
0004400 3331 0a0d 3953 3330 3030 3030 4346 0a0d
0004420 1d1a 0000
0004423
$ od -x 9s08rt8.s8p | tail -3
0003020 3532 0d39 530a 3039 3033 3030 4630 0d43
0003040 1a0a 00e6
0003044
Those files look like device configurations for P&E tools for writing to
FLASH memories connected to (or embedded in) specific embedded CPUs. I
used P&E's tools with various ColdFire processors on past projects, and
had to create my own configuration file for their FLASH programmer to
correctly handle one of our projects. IIRC, the files end with a Ctrl+Z
and a (badly chosen) checksum.
Ctrl+Z in a DOS (really CPM) heritage file is treated as an end-of-file
mark. CPM and some early versions of MSDOS did not have byte-accurate
file sizes, so files were read a block at a time and Ctrl+Z was the
signal to mark the end of the text content. Many DOS commands respected
that, including TYPE. As a result, Ctrl+Z is occasionally used to
separate pure text content from binary content in a hybrid file. That
legacy is present today in Windows, where the C runtime library treats a
Ctrl+Z as EOF if the file is opened in text mode.
The real problem is that once a file is treated as binary I can not
`diff' it between versions.
Yes, that would be problematic. I wonder if a better heuristic could
be implemented. If some percentage of the file is ASCII printable
characters maybe it could be treated as non-binary?
I wonder how hard it would be for the diff implementation to completely
ignore the ASCII/binary question and attempt the diff as if it were a
rational text file, leaving it as an issue for presentation only. If It
could simply fail over to a binary comparison if any assumptions get
violated such as an assumption of line length.
For presentation, the non-printable bytes could either be printed anyway
verbatim (consistent with some unknown code page assumption) or
translated to a "safe" replacement character.
This would handle these files rationally since the bulk of the files
actually have 80-column or shorter CRLF delimited lines, and only leave
a question of how to display the matching 1A and 00 bytes and the
differing checksum bytes.
--
Ross Berteig r...@cheshireeng.com
Cheshire Engineering Corp. http://www.CheshireEng.com/
_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users