Le vendredi 22 juin 2007, Arnaud HERITIER a écrit : > Be careful, because when you read an xml file with a reader (or an > inputstream) instead of a path (or an url) you can't use relative entities > in xml (because the parser can't know where the main doc is). yes, read(InputStream) is better than read(Reader) for encoding, but does not help for relative entities. Do you want that I add read(URL) at the same time, and document it as the preferred way of getting the model read? At the first time, for compatibility reasons, it would call read(InputStream) then read(Reader), but in the long term, it could be coded to permit relative entities. WDYT?
Hervé > This is not a problem because we discourage the usage of xml entities in > our xml documents, but we have to continue to say it !!! > > Arnaud > > On 22/06/07, Hervé BOUTEMY <[EMAIL PROTECTED]> wrote: > > Le vendredi 22 juin 2007, Kenney Westerhof a écrit : > > > Hi, > > > > Hi, > > > > > indeed, it's a case of doing new XXXInputstream( something, "encoding" > > > > ), > > > > > or a reader. Some work has been done on this, IIRC. > > > > > > The problem is that you need to prescan the xml declaration, so you > > > > start > > > > > parsing until you get the first xml language element that is not a > > > > comment, > > > > > (an xml element, in which case encoding is utf8, or > > > a doctype declaration, encoding is utf8, or > > > a processing instruction, and if it's the xml processing instruction > > > > parse > > > > > the encoding attribute and use that, otherwise it's utf8). > > > > > > This isn't too hard to do, except you need to restart reading the xml > > > > file > > > > > from start, if the encoding is not utf-8. The real problem is in the > > > > API's; > > > > > you cannot take a reader and restart that, since you cannot change the > > > encoding on an instantiated reader, and you certainly don't want to > > > wrap it. You'd need access to a raw inputstream that doesn't apply > > > encoding transformations to the bytes, and wrap that in a Pushback > > > something and then rewrap it if you found the encoding. > > > > exactly, this is the job done by XmlReader in Rome: > > > > https://rome.dev.java.net/apidocs/0_5/com/sun/syndication/io/XmlReader.ht > >ml > > > > I have the class, well written and tested by Rome developers. My first > > question is then: where to put it, to be able to use it in a lot of > > places where there are Readers instanciated for XML streams? > > plexus-utils? or make a dependency on Rome? or another place? > > > > > I'm a bit fuzzy on all the java.io api's, so we'll have to find the > > > > proper > > > > > class to use in the API so we can do this; a File would work. > > > > > > Anyway, I once tried to fix this issue but the api had to be changed > > > and there were just too many changes across plexus and maven at the > > > time to push this through. > > > > With this class available, the change to Maven model can be backward > > compatible: > > - the old read(Reader) API remains for compatibility, but is deprecated > > - a new read(InputStream) API is added, which calls read(new > > XmlReader(in)) > > The whole Maven code can then slowly migrate from deprecated Reader API > > to the > > new InputStream one, or use XmlReader if it is too hard to switch to > > InputStream. > > The only change is that there is a new dependency to this XmlReader > > class: I > > don't know if it is a real problem or not. > > > > I searched a little bit, this new API addition could be done individually > > in > > each .mdo file. But of course integrating it the code generation > > mechanism of > > Modello would be a lot better: Jason, if your proposal to have access to > > Modello is still valid, I'm interested. > > > > Regards > > > > Hervé > > > > > -- Kenney > > > > > > Hervé BOUTEMY wrote: > > > > Le jeudi 21 juin 2007, Jason van Zyl a écrit : > > > >> It seems like there are many problems with encoding that could be > > > >> easily solved with a couple tweaks to modello, specifically the > > > >> reader and writing so I've scheduled these for 2.0.8. There some > > > >> patches for these and hopefully Herve will work his magic with his > > > >> suggested fix. I like the idea of borrowing the idea from the Rome > > > >> IO utils to find the right encoding by default. That could easily be > > > >> integrated into modello. Herve if you need access to Modello we can > > > >> set you up. > > > > > > > > I'm interested at working on that. Do I need Modello access, or other > > > > components? I don't really know, these Modello things are the parts I > > > > didn't really dive into for the moment. > > > > The magic of the idea is that the encoding handling is not done by > > > > the parser, but by the reader. Then, the code that has to change is > > > > the > > > > code > > > > > > creating the Reader from a File: it must be changed from "new > > > > FileReader(file)" to "new XmlReader(file)". > > > > > > > > We need to: > > > > 1. choose where we put the XmlReader so that any code can use it when > > > > necessary. Or have a dependency on Rome: but all Rome for only 1 > > > > class (even if this class is really great)... > > > > 2. change every code that creates a Reader for XML parsing > > > > > > > > WDYT? > > > > > > > >> Thanks, > > > >> > > > >> Jason > > > >> > > > >> ---------------------------------------------------------- > > > >> Jason van Zyl > > > >> Founder and PMC Chair, Apache Maven > > > >> jason at sonatype dot com > > > >> ---------------------------------------------------------- > > > >> > > > >> > > > >> > > > >> > > > >> -------------------------------------------------------------------- > > > >>- To unsubscribe, e-mail: [EMAIL PROTECTED] > > > >> For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]