I'm using direct SQL queries on Fossil repository databases to find file
blobs, similar to what Fossil itself uses to generate the /bloblist web
page:

http://fossil-scm.org/index.html/artifact?name=d99ecd15&ln=837-847

With a list of file blobs, it's possible to find files from one repository
missing in another repository. Or, by printing file blobs to stdout
redirected to a pipe, it's possible to create scripts to perform full-text
searches over the entire repository contents from the command line.

For me, these are more great example of the powers of Fossil!

While using SQL anyway, I dumped the file blobs with the following SQL
command:

[a] SELECT content(blob.uuid) FROM blob WHERE blob.rid = "RID";

I'm aware that this is not the documented way to dump a file blob, there's:

[b] fossil artifact "rid:RID"
[c] fossil cat filename -r VERSION

(Finding VERSION from a RID or UUID requires further SQL queries.)

While [b] and [c] generate output with the same line endings as the
original files, method [a] generates output with completely different line
endings.

Following is a Windows batch script to demonstrate this. The Windows
built-in `certutil' command is used to generate test files with CR+LF, LF,
or CR line endings, which are then committed to a new repository. Next, the
file blobs are dumped to separate files with each of the methods [a], [b],
and [c].

------ file: test-eol.cmd ------
@echo off

echo 0000 43 52 0d 43 52 0d 43 52  0d 43 52 0d 43 52 0d
CR.CR.CR.CR.CR.>
hex-cr.txt
certutil -decodehex -f hex-cr.txt file-cr.txt
del hex-cr.txt

echo 0000 43 52 4c 46 0d 0a 43 52  4c 46 0d 0a 43 52 4c 46
CRLF..CRLF..CRLF> hex-crlf.txt
echo 0010 0d 0a 43 52 4c 46 0d 0a  43 52 4c 46 0d 0a
..CRLF..CRLF..>> hex-crlf.txt
certutil -decodehex -f hex-crlf.txt file-crlf.txt
del hex-crlf.txt

echo 0000 4c 46 0a 4c 46 0a 4c 46  0a 4c 46 0a 4c 46 0a
  LF.LF.LF.LF.LF.> hex-lf.txt
certutil -decodehex -f hex-lf.txt file-lf.txt
del hex-lf.txt

fossil init --admin-user florian test.fossil
fossil open test.fossil

fossil add file-cr.txt file-crlf.txt file-lf.txt

fossil ci --user florian -m "EOL Test" --no-warnings

fossil artifact "rid:2"> dump-artifact-rid2.txt
fossil artifact "rid:3"> dump-artifact-rid3.txt
fossil artifact "rid:4"> dump-artifact-rid4.txt

fossil cat file-cr.txt -r tip> dump-cat-file-cr.txt
fossil cat file-crlf.txt -r tip> dump-cat-file-crlf.txt
fossil cat file-lf.txt -r tip> dump-cat-file-lf.txt

echo SELECT content^^^(blob.uuid^^^) FROM blob WHERE blob.rid = "2"; |
fossil sql> dump-sql-rid2.txt
echo SELECT content^^^(blob.uuid^^^) FROM blob WHERE blob.rid = "3"; |
fossil sql> dump-sql-rid3.txt
echo SELECT content^^^(blob.uuid^^^) FROM blob WHERE blob.rid = "4"; |
fossil sql> dump-sql-rid4.txt

fossil close
------ file: test-eol.cmd ------

The resulting "dump-artifact-*.txt" and "dump-cat-file-*.txt" are all
identical to the original "file-*.txt", but the "dump-sql-*.txt" have
different line endings:

CR → CR, with an extra CR+LF at the end
CR+LF → CR+CR+LF, with an extra CR+LF at the end
LF → CR+LF, with an extra CR+LF at the end

I don't consider this a problem or even a bug, and the work-around is to
use method [b] to dump a file blob. However, I wanted to report the
behavior, as I don't see the logic (but this may be just me), and I'm
somewhat surprised that SQL queries return "modified data", in case
somebody relies on similar SQL queries for backups. Or is it just the
Fossil-specific `contents(uuid)' SQL function doing some line ending magic
for internal use, which is fixed before using the blob in Fossil?

--Florian
_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Reply via email to