Re: [jibx-users] Ignoring HTML (and other) tags in elements: How?

Dennis Sosnoski Tue, 17 Apr 2007 16:14:57 -0700

Hi Johannes,

The problem is actually at the parser level, not at the JiBX level. When 
you're parsing an XML document there's no way to tell the parser that 
you just want to get the content of a particular element as a string, 
rather than as XML (or HTML) components. So if you include well-formed 
HTML content, you can represent it as a DOM; or you can put it in a 
CDATA section in the XML, in which case the HTML does not need to be 
well-formed (HTML is often not well-formed, with things such as <br> 
used instead of <br/>, <p> start tags with no matching </p> end tag, etc.).


  - Dennis

Dennis M. Sosnoski
SOA and Web Services in Java
Training and Consulting
http://www.sosnoski.com - http://www.sosnoski.co.nz
Seattle, WA +1-425-939-0576 - Wellington, NZ +64-4-298-6117



Johannes Müller wrote:
> Hi,
>
> how can I get JiBX to ignore HTML (and other) tags in elements?
>
> The situation is as follows:
>
> There are (lots of) JiBX binding elements like
>
> <value style="element" name="description" field="descr" />
> <value style="element" name="foo" field="bar" />
>
> and all of the bound Java fields are of type java.lang.String.
>
> In the XML, the "description" and "foo" (and so on) elements can contain e.g. 
> HTML tags like in 
>
> <description>
>   <html>
>     <head>
>        This is the header of an HTML description.
>     </head>
>     <body>
>        And this is the body of an HTML description.
>     </body>
>   </html>
> </description>
>
> or they can contain fancy substrings that look like tags like in
>
> <description>
>   < | -)> This is a smiling chinese with a spikey chin.
> </description>
>
> Those tags must not be interpreted by JiBX, but instead, the whole contents 
> of the XML elements shall be stored without change in the corresponding 
> String fields (like it is defined in the binding definition).
>
> This means, that the first example given shall lead to a String containing 
> the following:
>
>   <html>
>     <head>
>        This is the header of an HTML description.
>     </head>
>     <body>
>        And this is the body of an HTML description.
>     </body>
>   </html>
>
> I do not want to further disassemble the Strings containing HTML (or fancy) 
> tags.
>
>
>
> The approaches I found to deal with this situation like discussed in 
>
> http://www.mail-archive.com/jibx-users@lists.sourceforge.net/msg02079.html
>
> seem quite complicated ("Include it in a CDATA section, hack the JiBX code, 
> or unmarshall it into DOM structure"). 
>
> Isn't there a simpler way to let JiBX relax and simply store all the contents 
> of an XML element in a Java String (without looking for substrings that look 
> like XML tags)?
>
> Of course, during marshalling, the contents of the Strings have to be filled 
> in the bound XML elements without change, too.
>
> Maybe there could be an attribute like ignoresubtags="true" in the JiBX 
> binding definitions of a future release, if there is no simple solution to 
> this situation yet.
>
> Thank you for your help,
>
> Johannes
>
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> jibx-users mailing list
> jibx-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/jibx-users
>
>   

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
jibx-users mailing list
jibx-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jibx-users

Re: [jibx-users] Ignoring HTML (and other) tags in elements: How?

Reply via email to