[ 
https://issues.apache.org/jira/browse/DAFFODIL-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17831427#comment-17831427
 ] 

Steve Lawrence commented on DAFFODIL-2884:
------------------------------------------

It feels like maybe there are two similar but orthogonal concepts to here:

1) Is parsed/unparsed data valid XML?

2) How should that XML string data be projected into and out of infoset?

For example, we might be projecting into a JSON infoset where xml data just 
becomes an JSON string, but we do still want to know if it is valid to allow 
allow for backtracking. And the reverse, one could imagine having JSON in our 
data straem and we want to ensure it's valid json, but still project it to an 
XML infoset where it is just a normal string.

So it feels like we don't really want to change our stringAsXml runtime 
property implementation for the XML Text infoset inputtter/outputter, since 
that handles concept 2. But we do need a new concept to allow 
validating/normalizing xs:string data.

On option could be new DPath functions, e.g. dfdlx:isXML(xs:string) or 
dfdlx:isJSON(xs:string). These functions could be used in a discriminator to 
backtrack if the parsed string isn't valid XML. These could also be implemented 
as UDFs in current versions of Daffodil, but is probably something we should 
include.

Another option is maybe a new property, e.g. dfdlx:textStringRep, which could 
have values of "standard" (i.e. current behavior), "xml", "json", or maybe 
more. This would do what you describe and validate/normalize the parsed string 
value according to the representation. The infoset value would still be a 
string, and infoset outputters/inputters would still need logic to determine 
how best to project them into the infoset using runtimeProperties.

Any other approaches?


> String-As-XML cause SDE on malformed XML data. Needs to be PE.
> --------------------------------------------------------------
>
>                 Key: DAFFODIL-2884
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2884
>             Project: Daffodil
>          Issue Type: Bug
>          Components: Back End
>    Affects Versions: 3.6.0
>            Reporter: Mike Beckerle
>            Priority: Major
>
> When using the string-as-XML feature, currently if the string that is 
> supposed to be XML is malformed, then a WstxUnexpectedCharException (or other 
> similar exception) gets thrown in the InfosetOutputter which is what does the 
> string-of-XML to actual XML conversion. The InfosetOutputter is outside the 
> scope of backtracking, so this error cannot be converted into a ParseError at 
> this point. The InfosetOutputter currently escalates this to an SDE.
> That's not correct for a data problem. The parser could be speculating down a 
> path where the string of data that is supposed to be XML is just gibberish. 
> If that string is malformed XML, a Parse Error needs to occur so we can 
> backtrack. 
> Converting Infoset into XML is normally something done by the 
> InfosetOutputter, but in this case it cannot be.  It needs to be done in the 
> string parser, and the Infoset needs to somehow cache the resulting XML so it 
> can be handed off to the InfosetOutputter. 
> I think this has to work analogously to text numbers. We parse the string 
> first, then convert to the data type, which for numbers is an 
> integer/float/decimal, etc. This conversion can fail, and that's a Parse 
> Error. String-as-XML needs to work the same way. The string is parsed via one 
> of the lengthKind techniques, then it is converted into XML. If the 
> conversion to XML fails, then it's a Parse Error. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to