On Tue, Mar 10, 2015 at 08:23:20PM -0700, travis+ml-lang...@subspacefield.org 
wrote:
> Regarding XML parsing, I ran across this:
> 
> http://codewhitesec.blogspot.de/2015/03/exploiting-hidden-saxon-xslt-parser-in.html
> 
> It's not exactly what I meant but seemed close enough to post.

Here's more details: parsing a string of XML as "events" from left-to-right:

http://tutorials.jenkov.com/java-xml/stax-xmleventreader.html

You can take action on a start element or end element.

For a tag with nested tags in it, if you take action (semantics) on
the start or end can affect whether it occurs before, or after a
nested tag.  Supposing that the parent and child tag modify the same
state variable (e.g. this.outputfile = tag.value), either the child or
parent will clobber the state variable (this.outputfile) and end up
controlling it.

This comes up ALL THE TIME in the web, where the first or last
occurence of a field clobbers some state variable.  In HTTP alone,
allow-charset, content-type, filename= attribute all come to mind.

Now, multiply that parser diversity across all avialable XML parsing
libraries - those that generate a DOM or AST and traverse it,
event-based ones, and so on - and you have a pretty tangled where the
same language parsed the same exact way leads to many results.

These are different parse trees and require extra information to
disambiguate:

(Time) flies like an arrow,
(Fruit flies) like a banana.

But what I'm after may be described not parsing syntax (yacc grammar),
but semantics (the software we write with the BNF), perhaps by analogy
with pronoun assignement:

(The teacher) gave (the student) (her) (yellow cake uranium).

Where "her" is clearly a pronoun (and thus a noun phrase) in either
case.  Our software, working with this parse tree, may have chosen:

her.antecedent = "student"
OR
her.antecedent = "the teacher"
...
predator.go_fetch(her.antecedent)
-- 
http://www.subspacefield.org/~travis/
"Computer crime, the glamor crime of the 1970s, will become in the
1980s one of the greatest sources of preventable business loss."
John M. Carroll, "Computer Security", first edition cover flap, 1977

Attachment: pgpjAigjwe1BW.pgp
Description: PGP signature

_______________________________________________
langsec-discuss mailing list
langsec-discuss@mail.langsec.org
https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss

Reply via email to