Re: [go-nuts] XML Parsing Nested Elements

lesmond Tue, 07 Nov 2017 08:19:32 -0800

Thank you for the incredibly detailed response.  It has really helped to 
understand the situation.


I actually started with an iterative approach with a Decoder object and 
this got very complex, very quickly.  It worked but the code was unworkable 
going forwards. I thought it might be worth trying this approach with an 
Unmarshaller.

I didn't think of ignoring the namespace prefixes.  You are right and after 
checking over the definitions there are no conflicting names at all so this 
works well.

Once again many thanks, the detailed write up I'm sure will help others.

On Tuesday, November 7, 2017 at 4:07:02 PM UTC, Konstantin Khomoutov wrote:
>
> On Tue, Nov 07, 2017 at 03:35:45AM -0800, les...@gmail.com <javascript:> 
> wrote: 
> > I am really struggling to access nested elements of an XML string and 
> > suspect it is down to the namespaces.  This string is obtained from a 
> > larger document and is the "innerXML" of some elements.  A simplified 
> > version is at... 
> > 
> > I could probably do this with multiple structs but want to have this in 
> a 
> > single struct. 
> > 
> > https://play.golang.org/p/Een-guMNP9 
> > 
> > I can seem to read things at the root but cannot get them using the ">" 
> > syntax at all.  What am I doing wrong?  Can I "insert" a namespace 
> element 
> > to assist it at all? 
> > 
> > I have manually removed the namespaces from this example to show what I 
> > think should happen!? 
> > https://play.golang.org/p/eCzbzgBYMq 
>
> The chief problem with your approach is lack of error checking. 
> The encoding/xml.Unmarshal() function returns an error value. 
> Had you checked it for being set (not nil), it would have given you an 
> immediate idea of what was wrong with your approach. 
>
> Regarding namespaces, your hunch is correct: since your XML document is 
> a fragment extracted from another document by a seemingly "textual" 
> method, all those "XML namespace prefixes" — parts in the names of the 
> elements which come before the ':' characters — have no meaning to the 
> XML parser since they are not defined in the document itself. 
>
> Unfortunately, currently there's no way to somehow explicitly define 
> them anywhere (say, in an instance of encoding/xml.Decoder) before 
> decoding, so you basically have three options: 
>
> - Somehow textually stick their definition on the top element of your 
>   XML document fragrems, so, say, it reads something like 
>
>     <fdm:trackInformation xmlns:fdm="urn:whatever:ns1" 
>          xmlns:nxcm="http://example.com/another/namespace/uri/"; 
>          ...> 
>
>   …and then parse the resulting document into a value of a struct 
>   type the tags on whose fields contain full namespaces in the names 
>   of the XML elements they're supposed to decode. 
>
> - Use iterative approach by creating an instance of encoding/xml.Decoder 
>   and calling its Token() method. 
>
>   When it returns a token of the types StartElement or EndElement, 
>   their Name property can be examined to see what its "Space" and 
>   "Local" fields are. 
>
> - Ignore the XML namespace prefixes completely. 
>
>   In your case this appears to be the simplest solution as the 
>   names of the elements appear to be unique anyway. 
>
> The variant which checks for errors, ignores the XML namespace prefixes 
> and also defines the field named "XMLName" on the type to check the 
> name of the element it's supposed to unmarshal can be implemented 
> as follows: 
>
> --------------------------------8<-------------------------------- 
>     package main 
>     
>     import ( 
>             "encoding/xml" 
>             "log" 
>     ) 
>     
>     type TrackInformation struct { 
>             XMLName struct{} `xml:"trackInformation"` 
>     
>             TimeAtPosition string `xml:"timeAtPosition"` 
>             Speed          int    `xml:"speed"` 
>     
>             DepApt string 
> `xml:"qualifiedAircraftId>departurePoint>airport"` 
>             ArrApt string `xml:"qualifiedAircraftId>arrivalPoint>airport"` 
>             Gufi   string `xml:"qualifiedAircraftId>gufi"` 
>     } 
>     
>     func main() { 
>     
>             xmlToParse := ` 
>     <fdm:trackInformation> 
>             <nxcm:qualifiedAircraftId> 
>                     <nxce:aircraftId>TEST</nxce:aircraftId> 
>                     <nxce:gufi>KR32642300</nxce:gufi> 
>                     <nxce:departurePoint> 
>                             <nxce:airport>KJFK</nxce:airport> 
>                     </nxce:departurePoint> 
>                     <nxce:arrivalPoint> 
>                             <nxce:airport>KJFK</nxce:airport> 
>                     </nxce:arrivalPoint> 
>             </nxcm:qualifiedAircraftId> 
>             <nxcm:speed>245</nxcm:speed> 
>     
>         <nxcm:timeAtPosition>2017-11-07T11:20:43Z</nxcm:timeAtPosition> 
>     </fdm:trackInformation>` 
>     
>             var trackInfo TrackInformation 
>             err := xml.Unmarshal([]byte(xmlToParse), &trackInfo) 
>             if err != nil { 
>                     log.Fatal(err) 
>             } 
>             log.Println(trackInfo) 
>     } 
> --------------------------------8<-------------------------------- 
>
> Playground [1]. 
>
>
> A couple of more notes. 
>
> - You can't use namespaces when defining the names of the nested 
>   elements.  The wording of the documentation is a bit moot but it does 
>   explicitly state this: «If the XML element contains a sub-element 
>   whose name matches the prefix of a tag formatted as "a" or "a>b>c"…» — 
>   notice that "the prefix of a tag" bit which actually means "the local 
>   name of an element". 
>
>   So when you need to match on full names of the elements, you'd have to 
>   use nested structs so that each field stands for an element without 
>   nesting, and the nesting is defined via your types rather than 
>   tags on their fields. 
>
> - The XML decoder implements a "strict" mode, which is "on" by default. 
>
>   What's interesting about it is that even when it's on, it turns a 
>   blind eye on undefined XML namespace prefixes: «Strict mode does not 
>   enforce the requirements of the XML name spaces TR. In particular it 
>   does not reject name space tags using undefined prefixes. Such tags 
>   are recorded with the unknown prefix as the name space URL.» 
>
>   This means that you can use your undefined namespace prefixes "as is" 
>   when decoding. [2] demonstrates this approach applied to the top-level 
>   XML elements.  You can't do this for that "a>b>c" notation in the tags 
>   but you still can apply it when implementing parsing using the nested 
>   types. 
>
> - Another trick up the sleeve of the XML decoder is support for custom 
>   unmarshaling functions for your custom types. 
>
>   Any of your types (such as TrackInformation) can implement a function 
>
>     UnmarshalXML(d *xml.Decoder, start xml.StartElement) error 
>
>   to make that type implement the encoding/xml.Unmarshaler interface. 
>
>   When the decoder sees a type implements this interface, it calls the 
>   UnmarshalXML function instead of dealing with the element's contents 
>   itself. 
>
>   What follows, is that you can have a hierarchy of low-level unexported 
>   types and a top-level "facade" type defining UnmarshalXML which 
>   internally first unmarshals the element using that hierarchy of types 
>   and then populates your "facade" type with the information ended up 
>   in that hierarchy of values. 
>
>
> Hope this helps. 
>
> 1. https://play.golang.org/p/KJvvWg9apu 
> 2. https://play.golang.org/p/AR5vDTKX0Q 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [go-nuts] XML Parsing Nested Elements

Reply via email to