On Wed, Oct 1, 2008 at 1:35 PM, Dominic Mitchell <[EMAIL PROTECTED]> wrote:

>   On 30 Sep 2008, at 20:08, Gnana Arasan wrote:
>
>    We are inserting the xml(UTF-8) conent using
> session.insertContent(uri,inputstream,options).by default option encoding is
> UTF-8.(ML version 3.5-2).For example person name josé is stored.In cq using
> doc(uri) the content seems to be JosÃ(c) .
>
>
> The thing to do is to check the string-length() of "JosÃ(c)".
>
> If it's 4, then it's being stored correctly in MarkLogic.  This means that
> the issue is to do with output — something is interpreting UTF-8 as
> ISO-8859-1.
>
> If it's 5 then it's being stored incorrectly in MarkLogic.  This means that
> the input processes you thought were sending in UTF-8 are really
> interpreting the data as ISO-8859-1.  I'd guess from your input mail that
> you're using Java to read the content in.  I'd be *extremely* careful in
> Java, as it's all too easy to use the "system default encoding" by accident.
>  This is normally cp-1252 on Windows, or MacRoman on a mac, neither of which
> is particularly useful.
>
> Any time you read data in Java, you need to specify an encoding.
>  Particular candidates to watch out for include 
> FileReader<http://java.sun.com/j2se/1.5.0/docs/api/java/io/FileReader.html>
>  and 
> String.getBytes()<http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html#getBytes()>.
>  If you examine the code that's creating that inputStream, you may well find
> such an example.
>
> -Dom
>
> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general
>
> Hi Dom,
          Thanks. I have done the same mistake what you mentioned
in java.Now able to search diacritic.
-Gnana Arasan.M
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to