Does the "mode" (text/binary) factor into the sha1 fingerprint hash of an artifact (single file) or commit ?
-bch On 12/8/14, Martin Gagnon <eme...@gmail.com> wrote: > On Mon, Dec 08, 2014 at 12:09:57PM -0800, Shal Farley wrote: >> Stephan, >> >> > If it has ANY bytes above 127, it's not, by definition, ASCII. i.e. >> > "it's binary." >> >> I would disagree with part of this statement. I agree that ASCII >> defines only the 7-bit code values, but I think this whole thread >> has run off the rails in talking about the content values as >> determining whether the file is "text" or "binary". >> >> But this discussion of content heuristics misses the point of why >> there is a distinction to be made in the first place. And that I >> think has more to do with whether the content is organized into >> "lines". >> >> In a functional sense for Fossil, a "text" file is one for which it >> is useful to display a line-oriented difference. For all other files >> ("binary" files) the difference can only be displayed in a way that >> is agnostic of the internal structure (if any) of the content. >> >> Given that there is no universal heuristic for discriminating "text" >> from "binary" files based on content, that determination must be >> treated as a bit of metadata about the file. >> >> Likewise, it is necessary to know for a given file what >> representation is used to separate lines. Knowledge of the line >> separator is seldom carried as metadata, because it is usually >> uniform in a given system. But in these days of interoperable >> systems and multi-platform support, this detail also may be a >> necessary piece of metadata to know about a file. ASCII code calls >> out the CR (carriage return) and LF (line-feed) control characters. >> DOS-based systems (including Windows) follow the direct ASCII >> tradition of using CR and LF, paired in that order (and often >> represented as CRLF) as the line separator. That tradition is also >> embodied in the Internet Mail standards for message content, header >> and body (absent MIME extensions). Unix-based systems use the LF >> character alone as the line separator in files (aka "newline"). >> Other systems have used CR alone. >> >> And additionally, the character set used to represent text in a file >> must also be carried as metadata (because of the ISO-8859 and other >> code-page based character sets). >> >> Only if all these items of metadata are known can the file content, >> or differences in the file content, be displayed in a useful form. >> So returning to this thread, it is convenient to have a heuristic >> that works most of the time to discriminate "text" from "binary" >> files, but it is necessary to also have a way for the user to >> explicitly provide that metadata (and ideally the character set >> metadata). >> > > +1 > > So if I summarize, we could implement a kind of: text-glob setting that > work like binary-glob, except it would force text instead of binary. If > a file doesn't match any of those 2 glob setting, the default fossil > heuristic would be used. > > Does it make sense ? > > -- > Martin G. > > _______________________________________________ > fossil-users mailing list > fossil-users@lists.fossil-scm.org > http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users > _______________________________________________ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users