On 3/17/06, Jean-frederic Clere <[EMAIL PROTECTED]> wrote: > > Costin Manolache wrote: > > >Sorry, I forgot there are 2 meanings of 'xml syntax' :-), I was thinking > if > >the output > >is an xml file - with encoding in declaration, but in regular jsp. (well, > >the patch is not dealing > >with jspx anyway ) > >I was referring to the fact that <?xml encoding="iso-8859-2"?> is treated > as > >template text, > >and pageEncoding (or web.xml ) takes precedence. > >In jsp-xml ( jspx ) it seems we report an error if the web.xml encoding > >doesn't match the > ><?xml?> encoding. I can't see many use cases for having an explicit > encoding > >in the > >xml header, and yet the file read with a different encoding. > > > > > In my case the xml header is: > <?xml version="1.0" encoding="OSD_EBCDIC_DF04_1"?> (In EBCDIC...) > Reading the file with ISO-8859-1 encoding only gives garbages. > > But the patch prevents reading the <@page pageEncoding="bla" %> so it > is bad.
Yes, the patch is bad - but what would be a good patch ? - if pageEncoding is not specified but document starts with <?xml encoding=...?> - use xml encoding - if pageEncoding is specified and so is <?xml encoding?> - report an error ( like jspx does ) or a warning or choose the xml encoding - leave current behavior - use default 8859-1 or pageEncoding only. <?xml encoding?> is probably more used and supported ( i.e. more 'standard' :-) that jsp pageEncoding. The jsp spec is clear that last option should be used - but having 2 conflicting encodings is a source of problems, and if we can't follow the 'higher' standard, we can at least warn. Well - not a big deal, but encodings tends to be a headache area for many people, in particular when different parts of the system have different 'standards' and defaults plus autodetections ( on browser, http, html, xml, or jsp ). Costin The old code should be improved to allow to use the sourceEnc when the > pageEncoding is not specified and ISO-8859-1 if none are specified. Cheers > > Jean-Frederic > > > > >Costin > > > > > >On 3/17/06, Bill Barker <[EMAIL PROTECTED]> wrote: > > > > > >> > >> > >> > >>>-----Original Message----- > >>>From: Costin Manolache [mailto:[EMAIL PROTECTED] > >>>Sent: Friday, March 17, 2006 11:57 AM > >>>To: Tomcat Developers List > >>>Subject: Re: svn commit: r386315 - > >>>/tomcat/jasper/tc5.5.x/src/share/org/apache/jasper/compiler/Pa > >>>rserController.java > >>> > >>>In his example ( where both XML and JSP declare encodings ) - > >>>which one > >>>would win ? > >>> > >>> > >>The patch only affects pages in JSP syntax, so the <?xml ... ?> is just > >>another piece of template text :). > >> > >> > >> > >>>IMO the XML encoding should win i.e. if the file uses xml > >>>syntax and starts > >>>with > >>><?xml version="1.0" encoding="iso-8859-2" ?>, then jsp > >>>pageEncoding should > >>>be ignored. > >>>If a jsp is written using the XML syntax - it is supposed to > >>>follow the XML > >>>rules - there is no > >>>exception in the XML spec for jsps specifying their different > >>>syntax for > >>>encoding. > >>> > >>> > >>> > >>The JSP expert group agrees with you:). In XML syntax, the XML encoding > >>should win out over <jsp:directive.page pageEncoding="..." />. > >> > >> > >> > >>>For non-XML jsps - I think respecting pageEncoding is a must, > >>>the jsp reader > >>>must scan the > >>>file to find the pageEncoding string - which is not trivial ( > >>>there is a > >>>reason why XML requires the > >>>encoding to be the first thing in the file, at the top, I > >>>would't bet on > >>>jasper implementing it correctly :-) > >>> > >>> > >>> > >>In JSP syntax, the spec (Appendix D) says that pageEncoding should win > (at > >>least when there is no matching <page-encoding /> in web.xml :). What > the > >>patch breaks is that with it Jasper won't even look for the pageEncoding > >>most of the time. > >> > >>Jasper looks like it does a pretty good job of guessing to set up the > >>Reader > >>that scans for the pageEncoding directive. And JFC seems to agree, > since > >>the patch is to use the guessed encoding rather than the one that was > >>specified :). > >> > >> > >> > >>>Costin > >>> > >>>On 3/17/06, Bill Barker <[EMAIL PROTECTED]> wrote: > >>> > >>> > >>>> > >>>> > >>>> > >>>>>-----Original Message----- > >>>>>From: Jean-frederic Clere [mailto:[EMAIL PROTECTED] > >>>>>Sent: Friday, March 17, 2006 4:13 AM > >>>>>To: Tomcat Developers List > >>>>>Subject: Re: svn commit: r386315 - > >>>>>/tomcat/jasper/tc5.5.x/src/share/org/apache/jasper/compiler/Pa > >>>>>rserController.java > >>>>> > >>>>>Bill Barker wrote: > >>>>> > >>>>> > >>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>-----Original Message----- > >>>>>>>From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > >>>>>>>Sent: Thursday, March 16, 2006 3:55 AM > >>>>>>>To: tomcat-dev@jakarta.apache.org > >>>>>>>Subject: svn commit: r386315 - > >>>>>>>/tomcat/jasper/tc5.5.x/src/share/org/apache/jasper/compiler/Pa > >>>>>>>rserController.java > >>>>>>> > >>>>>>>Author: jfclere > >>>>>>>Date: Thu Mar 16 03:54:29 2006 > >>>>>>>New Revision: 386315 > >>>>>>> > >>>>>>>URL: http://svn.apache.org/viewcvs?rev=386315&view=rev > >>>>>>>Log: > >>>>>>>If the encoding is not specified use the detected one. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>-1. > >>>>>>If it gets to this point, the detected encoding is *wrong* > >>>>>> > >>>>>> > >>>>>(e.g. <?xml > >>>>> > >>>>> > >>>>>>version="1.0" encoding="iso-8859-2" ?> in JSP syntax). > >>>>>> > >>>>>> > >>>>>> > >>>>>Why wrong? > >>>>> > >>>>> > >>>>Because the right encoding is the one specified in the <[EMAIL PROTECTED] > >>>>pageEncoding="utf8"%>. > >>>> > >>>> > >>>> > >>>>>+++ > >>>>>Connected to localhost. > >>>>>Escape character is '^]'. > >>>>>GET /try1.jsp > >>>>><?xml version="1.0" encoding="ISO-8859-2"?> > >>>>><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" > >>>>> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> > >>>>>+++ > >>>>> > >>>>> > >>>>> > >>>>This is about pageEncoding, so I don't see the relevance. > >>>> > >>>> > >>>> > >>>>>>I don't have access to an EBCDIC machine to know what the > >>>>>> > >>>>>> > >>>>>problem is, but > >>>>> > >>>>> > >>>>>>this isn't the fix. Possibly a better way to guess the > >>>>>> > >>>>>> > >>>>>encoding of the > >>>>> > >>>>> > >>>>>>Reader? > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>Thinking to it the patch is not prefect but the old code > >>>>> > >>>>> > >>>is worse we > >>> > >>> > >>>>>have a piece of code that detects correctly the source > >>>>> > >>>>> > >>>encoding and > >>> > >>> > >>>>>detroy it... > >>>>> > >>>>> > >>>>> > >>>>However, the old code adheres to the JSP spec, whereas your > >>>> > >>>> > >>>patch breaks > >>> > >>> > >>>>the > >>>>JSP spec (Appendix D). That automatically makes the old > >>>> > >>>> > >>>code better than > >>> > >>> > >>>>your patch. > >>>> > >>>> > >>>> > >>>>>In doParse() in ParserController.java the following happends > >>>>>parse() is called with pageEnc = sourceEnc > >>>>>jspConfigPageEnc = null > >>>>>isDefaultPageEncoding = false. > >>>>>But the line before the jspReader uses the sourceEnc to create the > >>>>>InputStreamReader so the content of the file is translated to > >>>>>utf-8 when > >>>>>reading it. > >>>>>In validator.java the charset will be set to the detected > >>>>>encoding... In > >>>>>the example above iso-8859.2. Bad for me that will be > >>>>>OSD_EBCDIC_DF04_1. > >>>>> > >>>>> > >>>>> > >>>>The only issue is why Jasper can't recognize your <[EMAIL PROTECTED] > >>>>pageEncoding="OSD_EBCDIC_DF04_1" %> statement. That's the > >>>> > >>>> > >>>part that I > >>> > >>> > >>>>can't > >>>>figure out (and your patch is masking :). > >>>> > >>>> > >>>> > >>>>>Cheers > >>>>> > >>>>>Jean-Frederic > >>>>> > >>>>> > >>>>> > >>>>>> > >>>>>> > >>>>>>This message is intended only for the use of the person(s) > >>>>>> > >>>>>> > >>>>>listed above as the intended recipient(s), and may contain > >>>>>information that is PRIVILEGED and CONFIDENTIAL. If you are > >>>>>not an intended recipient, you may not read, copy, or > >>>>>distribute this message or any attachment. If you received > >>>>>this communication in error, please notify us immediately by > >>>>>e-mail and then delete all copies of this message and any > >>>>> > >>>>> > >>>attachments. > >>> > >>> > >>>>>>In addition you should be aware that ordinary (unencrypted) > >>>>>> > >>>>>> > >>>>>e-mail sent through the Internet is not secure. Do not send > >>>>>confidential or sensitive information, such as social > >>>>>security numbers, account numbers, personal identification > >>>>>numbers and passwords, to us via ordinary (unencrypted) e-mail. > >>>>> > >>>>> > >>>>>> > >>>>>> > >>>>--------------------------------------------------------------------- > >>>> > >>>> > >>>>>>To unsubscribe, e-mail: [EMAIL PROTECTED] > >>>>>>For additional commands, e-mail: [EMAIL PROTECTED] > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> > >>>--------------------------------------------------------------------- > >>> > >>> > >>>>>To unsubscribe, e-mail: [EMAIL PROTECTED] > >>>>>For additional commands, e-mail: [EMAIL PROTECTED] > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>> > >>>>This message is intended only for the use of the person(s) > >>>> > >>>> > >>>listed above as > >>> > >>> > >>>>the intended recipient(s), and may contain information that > >>>> > >>>> > >>>is PRIVILEGED > >>> > >>> > >>>>and CONFIDENTIAL. If you are not an intended recipient, > >>>> > >>>> > >>>you may not read, > >>> > >>> > >>>>copy, or distribute this message or any attachment. If you > >>>> > >>>> > >>>received this > >>> > >>> > >>>>communication in error, please notify us immediately by > >>>> > >>>> > >>>e-mail and then > >>> > >>> > >>>>delete all copies of this message and any attachments. > >>>> > >>>>In addition you should be aware that ordinary (unencrypted) > >>>> > >>>> > >>>e-mail sent > >>> > >>> > >>>>through the Internet is not secure. Do not send > >>>> > >>>> > >>>confidential or sensitive > >>> > >>> > >>>>information, such as social security numbers, account > >>>> > >>>> > >>>numbers, personal > >>> > >>> > >>>>identification numbers and passwords, to us via ordinary > >>>> > >>>> > >>>(unencrypted) > >>> > >>> > >>>>e-mail. > >>>> > >>>> > >>>> > >>>> > >>>> > >>>--------------------------------------------------------------------- > >>> > >>> > >>>>To unsubscribe, e-mail: [EMAIL PROTECTED] > >>>>For additional commands, e-mail: [EMAIL PROTECTED] > >>>> > >>>> > >>>> > >>>> > >> > >>This message is intended only for the use of the person(s) listed above > as > >>the intended recipient(s), and may contain information that is > PRIVILEGED > >>and CONFIDENTIAL. If you are not an intended recipient, you may not > read, > >>copy, or distribute this message or any attachment. If you received this > >>communication in error, please notify us immediately by e-mail and then > >>delete all copies of this message and any attachments. > >> > >>In addition you should be aware that ordinary (unencrypted) e-mail sent > >>through the Internet is not secure. Do not send confidential or > sensitive > >>information, such as social security numbers, account numbers, personal > >>identification numbers and passwords, to us via ordinary (unencrypted) > >>e-mail. > >> > >> > >>--------------------------------------------------------------------- > >>To unsubscribe, e-mail: [EMAIL PROTECTED] > >>For additional commands, e-mail: [EMAIL PROTECTED] > >> > >> > >> > >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >