Ard,

> > By coincidence I discovered that the xml file contains 
> > leading binary characters (ff fe) and that it as a whole is 
> > seen as binary by my text editor. So perhaps this is causing 
> > the duplicate results.

I came across this link: 
http://www.25hoursaday.com/weblog/2005/10/18/TheMythOfTheOfficeXMLBinaryKey.aspx



It mentions the ff fe bytes ( to indicate 
little-endian order)  I see at the beginning of my document.

The xml files contain the heading <?xml version="1.0" encoding="utf-16"?> 
specifying the encoding.

> > I'll try to get them removed and see whether the issues is resolved.  
> 
> If you could do a test with this, it would give me some pointers indeed...

When I manually overwrite a document (left out the two bites and also the 
encoding) the index is being 'repaired' and only one hit is found with a 
search. It looks like the trailing bytes and the encoding are causing the 
unexpected search results.


--Æde


_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/********************************************
Hippocms-dev: Hippo CMS development public mailinglist

Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html

Reply via email to