[cross posted because people on the cocoon list might hit this as well] I've always tested xindice with english documents, so I didn't notice this behavior until today when I imported an italian XML document.
The document is encoded using UTF-8 and looks like this: <?xml version="1.0" encoding="UTF-8"?> ... <subtitle> In sempre più film il computer con la Mela è l'arma dei giusti contro criminali di ogni specie che invece preferiscono i pc </subtitle> ... [this is a news document taken from an italian on-line newspaper] ù -> ù è -> è are the two unicode translations for the non-ASCII character (since UTF-8 is back compatible to ASCII you don't note any difference until you use non-ASCII letters such as these) Opening the document in Explorer or XML-Spy yields the correct characters. Then I import it into the database and I access it from the cocoon XML:DB source I get (in the explorer window): <?xml version="1.0" encoding="UTF-8" ?> ... <subtitle> In sempre più film il computer con la Mela è l'arma dei giusti contro criminali di ogni specie che invece preferiscono i pc </subtitle> same thing when opening the source from the the notepad window. But in win2k notepad is UNICODE-aware... so I saved the source on disk and I opened it with UltraEdit (which is UNICODE-aware but has a nice binary view) and voila' ... <subtitle> In sempre più film il computer con la Mela è l'arma dei giusti contro criminali di ogni specie che invece preferiscono i pc </subtitle> ... where I believe that à -> à ¹ -> ¹ This similarity in encoding probably shows why nobody noticed this before. So I went directly into the news.tbl and got the same bytes: n sempre più film il compu ter con la Mela è l'arma d ei giusti which clearly indicates that 'xindice' command line import tool is somewhat ignoring the 'UTF-8' encoding and performing UTF-8 encoding on something that is *already* UTF-8 encoded. My perception is that there is nothing wrong in the way XIndice or Cocoon get the information *out* of the database: the problem resides on how the information gets *in* the database. I would suggest the XIndice dev community to consider this bug a showstopper for the 1.0 final release. -- Stefano Mazzocchi One must still have chaos in oneself to be able to give birth to a dancing star. <[EMAIL PROTECTED]> Friedrich Nietzsche -------------------------------------------------------------------- --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]