Re: [fossil-users] How to force text for all files?

Ross Berteig Mon, 08 Dec 2014 14:36:59 -0800

On 12/7/2014 10:35 AM, Andy Bradford wrote:

.... I downloaded  your zip file and  looked at the
files  and discovered  that the  last few  bytes of  each file  has some
control characters (0x1a,  0x1d), null characters (0x00) and  one has an
extended ASCII character 0xe6.


$ od -x 9s08gw32.s8p | tail -3
0004400     3331    0a0d    3953    3330    3030    3030    4346    0a0d
0004420     1d1a    0000
0004423

$ od -x 9s08rt8.s8p  | tail -3
0003020     3532    0d39    530a    3039    3033    3030    4630    0d43
0003040     1a0a    00e6
0003044

Those files look like device configurations for P&E tools for writing toFLASH memories connected to (or embedded in) specific embedded CPUs. Iused P&E's tools with various ColdFire processors on past projects, andhad to create my own configuration file for their FLASH programmer tocorrectly handle one of our projects. IIRC, the files end with a Ctrl+Zand a (badly chosen) checksum.

Ctrl+Z in a DOS (really CPM) heritage file is treated as an end-of-filemark. CPM and some early versions of MSDOS did not have byte-accuratefile sizes, so files were read a block at a time and Ctrl+Z was thesignal to mark the end of the text content. Many DOS commands respectedthat, including TYPE. As a result, Ctrl+Z is occasionally used toseparate pure text content from binary content in a hybrid file. Thatlegacy is present today in Windows, where the C runtime library treats aCtrl+Z as EOF if the file is opened in text mode.

The real problem  is that once a  file is treated as binary  I can not
`diff' it between versions.


Yes, that  would be problematic.  I wonder  if a better  heuristic could
be  implemented. If  some  percentage  of the  file  is ASCII  printable
characters maybe it could be treated as non-binary?

I wonder how hard it would be for the diff implementation to completelyignore the ASCII/binary question and attempt the diff as if it were arational text file, leaving it as an issue for presentation only. If Itcould simply fail over to a binary comparison if any assumptions getviolated such as an assumption of line length.

For presentation, the non-printable bytes could either be printed anywayverbatim (consistent with some unknown code page assumption) ortranslated to a "safe" replacement character.

This would handle these files rationally since the bulk of the filesactually have 80-column or shorter CRLF delimited lines, and only leavea question of how to display the matching 1A and 00 bytes and thediffering checksum bytes.


--
Ross Berteig                               r...@cheshireeng.com
Cheshire Engineering Corp.           http://www.CheshireEng.com/

_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Re: [fossil-users] How to force text for all files?

Reply via email to