Yeah, this is an encoding problem. So setting the encoding format to the 
Transformer helped.

Properties properties = transformer.getOutputProperties();
properties.setProperty(OutputKeys.ENCODING, "euc-kr");
transformer.setOutputProperties(properties);

ThankYou Very Much
 
-----Original Message-----
From: Christopher Sahnwaldt [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, April 19, 2006 4:11 PM
To: [email protected]
Cc: [email protected]; Serena A
Subject: Re: Problem in Parsing xml with Korean Characters

This is (most likely) not a problem of the parser, but an
encoding problem, so I'm moving this to the users list.

You should avoid dealing with byte arrays and byte streams 
(InputStream, OutputStream) when you're processing XML. 
Let the parser figure out the file encoding (it reads the 
encoding="euc-kr" part), otherwise use Strings and character 
streams (Reader, Writer).

This should work:

public static void main(String[] args) throws TransformerException 
{
  Document doc = load(new File("trial1.xml"));
  System.out.println(toString(doc));
}

private static Document load( File file ) throws TransformerException
{
  Transformer copy = TransformerFactory.newInstance().newTransformer();
  Source source = new StreamSource(file);
  DOMResult result = new DOMResult();
  copy.transform(source, result);
  return (Document)result.getNode();
}

private static String toString( Document doc ) throws TransformerException
{
  Transformer copy = TransformerFactory.newInstance().newTransformer();
  Source source = new DOMSource(doc);
  Writer writer = new StringWriter();
  Result result = new StreamResult(writer);
  copy.transform(source, result);
  return writer.toString();
}


Here are two articles that explain some of the background:

http://www.joelonsoftware.com/articles/Unicode.html
http://www.jorendorff.com/articles/unicode/java.html

Hope that helps,
Christopher.

> --- Ursprüngliche Nachricht ---
> Von: Sereena <[EMAIL PROTECTED]>
> An: [email protected]
> Betreff: Problem in Parsing xml with Korean Characters
> Datum: Tue, 18 Apr 2006 13:54:15 +0000 (UTC)
> 
> I am trying to parse an xml with Korean characters in it, but when some of
> the 
> korean characters are encountered, the parsing stops. If I remove the 
> characters causing problem, the rest of the xml is also parsed. Could
> anyone 
> help me to get this solved so that I can parse the whole xml with any
> korean 
> character in it?
> Please note that I am not getting any exception here, but the parsing
> stops.
> 
> The code would look like this : 
> 
> File data = new File("E://Folder1..//trial1.xml");
>                               int fileSize = (int) data.length();
>                               FileInputStream file = new FileInputStream
> (data);
>                               byte[] data2 = new byte[fileSize];
>                               
>                               
>                               for(int i=0; i < fileSize; i++ ) 
>                               {
>                                               data2[i] = (byte)
file.read();
> //                                            System.out.println(data2[i]);
>                               }
>                                                       
>                               file.close(); 
>                                 DocumentBuilderFactory dbf = 
> DocumentBuilderFactory.newInstance();
>                               DocumentBuilder db =
dbf.newDocumentBuilder();
>                                 doc = db.parse(new InputSource(new 
> ByteArrayInputStream(data2)));
> 
> //The following is to get the document in string format
>                                 System.out.println("Reconverting");
>                               byte [] removeResult=document2bytes
> (doc.getDocumentElement());
>                                 String result = new String(removeResult);
>                               System.out.println("Result =" + result);
> 
>                                
> System.out.println(encodingString("utf-8","iso-
> 8859-1",result));
> 
> 
> public static byte[] document2bytes(Node node) {
>                               try {
>                                       Source source = new DOMSource(node);
>                                               ByteArrayOutputStream out =
new 
> ByteArrayOutputStream();
>                                       StringWriter stringWriter = new 
> StringWriter();
>                                       Result result = new
StreamResult(out);
>                                       TransformerFactory factory = 
> TransformerFactory.newInstance();
>                                       Transformer transformer = 
> factory.newTransformer();
>                                       transformer.transform(source,
result);
>                                       return out.toByteArray();
>                               } catch (TransformerConfigurationException e)
{
>                                       e.printStackTrace();
>                               } catch (TransformerException e) {
>                                       e.printStackTrace();
>                               }
>                               return null;
>       }
> 
> public static String encodingString(String fromEnc, String toEnc, String
> value)
>                       throws IOException {
>                       if (value != null) {
>                               if ("iso-8859-1".equals(toEnc)) {
>                                       System.out.println("[encodeString] 
> value from static table cell element " + value);
>                                       value = new String(value.getBytes
> (), "UTF-8");
>                                       System.out.println("[encodeString] 
> Before encoding " + value);
>                                       value = escapingNCR(value, false);
>                                       System.out.println(" [encodeString] 
> After encoding NCR " + value);
>                               }
>                               else {
>                                       System.out.println("[encodeString] 
> Before encoding " + value);
>                                       ByteArrayInputStream bis = new 
> ByteArrayInputStream(value.getBytes());
>                                       ByteArrayOutputStream bos = new 
> ByteArrayOutputStream();
>                                       // Set up character stream
>                                       Reader r = new BufferedReader(new 
> InputStreamReader(bis, fromEnc));
>                                       Writer w = new BufferedWriter(new 
> OutputStreamWriter(bos, toEnc));
>                               
>                                       char[] buffer = new char[4096];
>                                       int len;
>                                       while ((len = r.read(buffer)) != -1)
>                                               w.write(buffer, 0, len);
>                                       r.close();
>                                       w.flush();
>                                       w.close();
>                                       value = bos.toString();
>                                       System.out.println("[encodeString] 
> After encoding " + value);
>                               }
>                       }
>                       return value;
>               }
> 
> 
> public static String escapingNCR(String str, boolean escapeAscii) 
>               {
>                  String ostr = new String();
> 
>                  for(int i=0; i<str.length(); i++) {
> 
>                         char ch = str.charAt(i);
>                         //System.out.println(new String(new char[]{ch}));
>       
>                         if (!escapeAscii && ((ch >= 0x0020) && (ch <= 
> 0x007e)) || specialSaveChars.indexOf(ch) >= 0) {
>                               ostr += ch ;
>                         }else {
>                               ostr += "&#x" ;
>                               String hex =
Integer.toHexString(str.charAt(i) 
> & 0xFFFF);
>                               if (hex.length() == 2) {
>                                       ostr += "00" ;
>                               }
>                               ostr += hex.toUpperCase(Locale.ENGLISH);
>                               ostr += ";";
>                         }
>                  }
> 
>                  return (ostr);
>               }
> 
> 
> The xml 'trial1.xml' that I parse could look like this:
> 
> ---------------------------------------------------
> <?xml version="1.0" encoding="euc-kr"?>
> <TrialXML>ÇÁ·©Å¬¸° ¾î±×·¹½Ãºê ±×·Î½º </TrialXML>
> ---------------------------------------------------
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-----------------------------------------------------------------------------------------------------------------------------
Disclaimer
-----------------------------------------------------------------------------------------------------------------------------

"This message(including attachment if any)is confidential and may be 
privileged.Before opening attachments please check them
for viruses and defects.MindTree Consulting Private Limited (MindTree)will not 
be responsible for any viruses or defects or
any forwarded attachments emanating either from within MindTree or outside.If 
you have received this message by mistake please notify the sender by return  
e-mail and delete this message from your system. Any unauthorized use or 
dissemination of this message in whole or in part is strictly prohibited.  
Please note that e-mails are susceptible to change and MindTree shall not be 
liable for any improper, untimely or incomplete transmission."

-----------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to