On Wed, Oct 1, 2008 at 1:35 PM, Dominic Mitchell <[EMAIL PROTECTED]> wrote:
> On 30 Sep 2008, at 20:08, Gnana Arasan wrote: > > We are inserting the xml(UTF-8) conent using > session.insertContent(uri,inputstream,options).by default option encoding is > UTF-8.(ML version 3.5-2).For example person name josé is stored.In cq using > doc(uri) the content seems to be JosÃ(c) . > > > The thing to do is to check the string-length() of "JosÃ(c)". > > If it's 4, then it's being stored correctly in MarkLogic. This means that > the issue is to do with output — something is interpreting UTF-8 as > ISO-8859-1. > > If it's 5 then it's being stored incorrectly in MarkLogic. This means that > the input processes you thought were sending in UTF-8 are really > interpreting the data as ISO-8859-1. I'd guess from your input mail that > you're using Java to read the content in. I'd be *extremely* careful in > Java, as it's all too easy to use the "system default encoding" by accident. > This is normally cp-1252 on Windows, or MacRoman on a mac, neither of which > is particularly useful. > > Any time you read data in Java, you need to specify an encoding. > Particular candidates to watch out for include > FileReader<http://java.sun.com/j2se/1.5.0/docs/api/java/io/FileReader.html> > and > String.getBytes()<http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html#getBytes()>. > If you examine the code that's creating that inputStream, you may well find > such an example. > > -Dom > > _______________________________________________ > General mailing list > [email protected] > http://xqzone.com/mailman/listinfo/general > > Hi Dom, Thanks. I have done the same mistake what you mentioned in java.Now able to search diacritic. -Gnana Arasan.M
_______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
