From: "Dale R. Worley" <wor...@alum.mit.edu>
From the original poster's point of view: Yes, you can use Git to store
various versions of MS Word documents, but you probably don't get much
benefit from doing so, since Git can't see into the different versions
of documents to see how they differ; to Git they're just blobs.  OTOH,
it may be that "collections of blobs" is all that you need the storage
system to provide.

Konstantin Khomoutov <flatw...@users.sourceforge.net> writes:
"Steve (Gadget) Barnes" <gadgetst...@hotmail.com> wrote:
At the risk of getting flamed for mentioning a differnt dVCS, the
Mercurial, (hg), project has a very sneaky extension called zipdoc
that stores the content of the zip files, (docx are actually zips
containing XML), and the fact that they belong in a specific .docx,
(or whatever), file.  On committing such a file it is actually
unzipped and the constituents either stored, or for an update, diffed
and then on a pull they are pulled as constituent parts and then
zipped to reconstitute the original file.

You could either consider using Mercurial or trying to find or
develop a similar extension.

I wonder what this actually buys: you'll end up with a bunch of XML
files (and picture files, if any, and the Manifest file, and so on),
and the problem is that that XML file representing "the content" is as
readable as the original .docx.  As they say, "XML combines the
efficiency of text files with the readability of binary files" [1].
I mean, diffing a machine-produced XML files, where a tiny
logical change in a document could result in hefty parts of that XML
swath rewritten is just marginally better than the original problem.

The question is this: If you make a small change to the document (as a
human sees it), does this cause a small change to the XML files within
the Zip? If the answer is Yes, then many revisions of a document can be stored densely in a repository. And it might be possible to merge small
differences in documents using a standard merging approach.

But the only way to know would be to talk to someone who has
considerable experience with this.

While not having personal experience, I've seen a number of reports that the 'expanded XML' approach to "docx" style documents (including LibreOffice I understand), which are zips of XMLs, often fails because the main package presumes that the internal XML files are in a particular order. Once the zip has been expanded, that order of file components is lost, so when the VCS repackages the zip, the components are not in the right order, and the main program can't read it properly.

The key to all this (doing version differencing) is to locate a method [program] which can be fed the old and new versions, and have the diff presented to you in a meaninful fashion. Often 'Word' style documents don't have a good way that is both meaningful and compact at the same time. (a human factors problem, not a coding problem ;-) !

If the OP's originating program has a 'compare documents' mode then a small bit of coding should allow Git to feed the old version and new version to it, as long as it has an external API (rather than it all being via Gui/menu selection).

--

Philip
--
You received this message because you are subscribed to the Google Groups "Git for 
human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to