On Thu, Dec 14, 2006 at 06:38:26PM +0100, Ren? Peinl wrote:
> Hi guys,
> I'm sorry to bother you with this, but you seem to be the only people
> reachable outside Microsoft that know about the XLS file format. I've dealt
> myself a lot with the XML versions and also did some minor enhancements to
> the OO XSLTs that convert WordML to OO, but now I need some help with the
> binary version.
> I'm trying to write a comparison function that compares two versions of a
> document with each other and should return true if the documents have the
> same content and false otherwise. I'm using an MD5 hash to do this.
> The reason is, that I want to eliminate versions of documents in Sharepoint
> where only metadata has changed. Unfortunately, Sharepoint is so clever that
> it writes Metadata not only into its own database, but also inside the
> document itself, if it is an office document type.
> Therefore I want to strip off the header (and trailer) that contains
> metadata. For doc files this is quite easy. I just had to remove (or
> overwrite with zeros) the first 2554 and the last 1520 bytes and compare the
> files afterwards.
> Unfortunately this strategy does not work with XLS files. It seems that every
> sheet inside the file has it's own copy of metadata.
> Can you give me any advice, how to get rid of the metadata (just for the
> comparison). Is there any byte sequence I can search for and then overwrite
> the next x byte with zeros?
> I would be really thankful for any help.
> Thanks a lot and best regards
> Ren?
The metadata is stored in standard OLE2 format. You can not rely on
it being at a specific byte position in the file. There are simple
tools available to dump the content (eg via libgsf) you'd need to
write something yourself if your goal was to strip out some of the
properties. The code in libgsf (C) or hpsf/poi (java) should make
it fairly simple.
There are some docs available on the properties in OLE2 in
http://jakarta.apache.org/poi/hpsf/
where you can also find some docs on the OLE2 container format
itself.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]