You don't want to call toString() on the Byte Array you want to call it on the 
ByteArrayOutputStream.  Try this instead:

String pageContent;           //string pagecontent contains the html content
StringBufferInputStream sbis=new StringBufferInputStream(pageContent);
ByteArrayOutputStream baos=new ByteArrayOutputStream();
Tidy tidy = new Tidy();
            try
            {

                  tidy.setXmlOut(true);
                  tidy.setXHTML(false);
                  tidy.setMakeClean(true);
                  tidy.setTidyMark(false);
                  tidy.setUpperCaseTags(false);
                  tidy.setUpperCaseAttrs(false);
                  tidy.setQuoteAmpersand(false);
                  tidy.setNumEntities(true);
                  tidy.setCharEncoding(Configuration.UTF8);

                  tidy.parse(sbis,baos);
                  str=baos.toString();
                  System.out.println("This is XHTML>>>>>>>>>>>>\n");
                  System.out.println(str);
            }
catch(Exception ex){}
sbis.close();
baos.close();

  (*Chris*)


Original Message -----------------------
Hi all

I have the url content in a string.That  means string contains html code.I
want to pass it to w3c tidy ,perform clean and stored it in a byte array.I
tried with following code snippet.But it showing error message and output
some junk data. Instead of byte array I tried with FileOutputStream and
create a temporary file.It worked fine.I also want to pass this
bytearraycontent to XSLTInputHandler directly.How is it possible?

The following is my html code

<html>
<title>
<head></head>
</title>
<body>
<h3>This is for testing</h3>
</body>
</html>

servlet code snippet

String pageContent;           //string pagecontent contains the html content
StringBufferInputStream sbis=new StringBufferInputStream(pageContent);
ByteArrayOutputStream baos=new ByteArrayOutputStream();
Tidy tidy = new Tidy();
            try
            {

                  tidy.setXmlOut(true);
                  tidy.setXHTML(false);
                  tidy.setMakeClean(true);
                  tidy.setTidyMark(false);
                  tidy.setUpperCaseTags(false);
                  tidy.setUpperCaseAttrs(false);
                  tidy.setQuoteAmpersand(false);
                  tidy.setNumEntities(true);
                  tidy.setCharEncoding(Configuration.UTF8);

                  tidy.parse(sbis,baos);
                  byte [] buff=new byte[2048];
                  buff=baos.toByteArray();                       
                  str=(String)buff.toString();
                  System.out.println("This is XHTML>>>>>>>>>>>>\n");
                  System.out.println(str);
            }
catch(Exception ex){}
sbis.close();
baos.close();


It results following output

4:47:11,406 ERROR [STDERR]
Tidy (vers 4th August 2000) Parsing "InputStream"
14:47:11,421 ERROR [STDERR] line 3 column 1 - Warning: missing </title>
before <head>
14:47:11,421 ERROR [STDERR] line 3 column 1 - Warning: <head> isn't allowed
in <body> elements
14:47:11,421 ERROR [STDERR] line 3 column 7 - Warning: </head> isn't allowed
in <body> elements
14:47:11,421 ERROR [STDERR] line 4 column 1 - Warning: discarding unexpected
</title>
14:47:11,421 ERROR [STDERR] line 5 column 1 - Warning: <body> isn't allowed
in <body> elements
14:47:11,437 ERROR [STDERR]
InputStream: Document content looks like HTML 3.2
14:47:11,437 ERROR [STDERR] 5 warnings/errors were found!
14:47:11,437 INFO  [STDOUT] This is XHTML>>>>>>>>>>>>
14:47:11,437 INFO  [STDOUT] [EMAIL PROTECTED]






**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote also confirms that this email message has been swept by
MIMEsweeper for the presence of computer viruses.

www.mimesweeper.com
**********************************************************************


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to