Re: XML with large text entries are broken down to chunks when parsing with Axiom - non-coalescing mode.

Hiranya Jayathilaka Sat, 01 May 2010 02:44:55 -0700

Hi Kasun,

On Sat, May 1, 2010 at 11:03 AM, Kasun Indrasiri <[email protected]> wrote:


> Hi,
>
> I guess this becomes even more riskier in a scenario like this.
>
> XML string :  "<a> a_ lengthy_string</a>" -> omElem
>
> Once we parse this xml in non-coalescing mode and create an OM
> element(omElem) with this,
>
> - first Child : contains the first portion of 'a_lengthy_string' string
> - last Child : contains the rest
>
> However, as Hiranya mentioned 'omEle.getText()' will give us the correct
> value of the text content.
>
> Is this the acceptable behavior?
>

Yes. It seems if you are using non-coalescing mode, you should use the
getText() method to retrieve the full text from elements.

Thanks,
Hiranya


>
> regards,
>
> Kasun
>
>
> On Fri, Apr 30, 2010 at 9:12 PM, Andreas Veithen
> <[email protected]>wrote:
>
> > Axiom always creates the nodes based on the events received from the
> > underlying parser. If javax.xml.stream.isCoalescing is set to false on
> > the parser, then by definition the parser may return large text nodes
> > in multiple chunks. The problem is that if
> > javax.xml.stream.isCoalescing is set to true, StAX doesn't report
> > CDATA sections in the document as CDATA events, but as CHARACTER
> > events. It is however possible to configure Woodstox to report CDATA
> > sections without splitting text nodes into chunks. Note that even with
> > such a configuration, OMElement#getText should always be used to
> > extract the text content of an element (to cover the case where the
> > element contains a mix of text nodes and CDATA sections).
> >
> > Note that while coalescing is switched off by default at the StAX
> > level, Axiom overrides this so that by default coalescing is turned on
> > [1]. It is not surprising that there is code that implicitly relies on
> > this. Therefore, working with Axiom in non coalescing mode is always a
> > risk.
> >
> > Andreas
> >
> > [1] http://people.apache.org/~veithen/axiom/userguide/ch04.html#d0e866
> >
> > On Fri, Apr 30, 2010 at 11:51, Kasun Indrasiri <[email protected]>
> wrote:
> > > Hi,
> > >
> > > When parsing XML in non-coalescing mode
> ("javax.xml.stream.isCoalescing",
> > > false) Axiom breaks down large text entries to multiple chunks.
> Therefore
> > CDATA
> > > elements with lengthy texts get translated into multiple CDATA
> elements.
> > >
> > > thanks,
> > > --
> > > Kasun Indrasiri
> > > Senior Software Engineer,
> > > WSO2 Inc. - "Lean . Enterprise . Middleware" - http://www.wso2.com/
> > > Blog : http://kasunpanorama.blogspot.com/
> > >
> >
>
>
>
> --
> Kasun Indrasiri
> Senior Software Engineer,
> WSO2 Inc. - "Lean . Enterprise . Middleware" - http://www.wso2.com/
> Blog : http://kasunpanorama.blogspot.com/
>



-- 
Hiranya Jayathilaka
Software Engineer;
WSO2 Inc.;  http://wso2.org
E-mail: [email protected];  Mobile: +94 77 633 3491
Blog: http://techfeast-hiranya.blogspot.com

Re: XML with large text entries are broken down to chunks when parsing with Axiom - non-coalescing mode.

Reply via email to