In cases where performance is critical, I think you'd be best off avoiding XPath altogether. If you use a simple SAX parser, you can process whatever elements you need as they stream by and ignore everything else, which saves processing time and memory. With XPath, I imagine the entire document has to be parsed and in memory, then the XPath expression interpreted and evaluated, before you get a result. (The fact that this is the only model I can come up with may be a failure of imagination, however.) XPath is incredibly powerful and flexible, but you don't need all that (and its overhead) if you're always executing one of a small number of queries that you know in advance. An optimal Xerces SAX parser might well be more efficient than libxml parsing + XPath evaluation.
Of course, XPath is a useful thing to have in your toolbox, and if you include it now, you'll have that much more flexibility in the future. -----Original Message----- From: Will Sappington [mailto:[EMAIL PROTECTED] Sent: Monday, December 18, 2006 12:22 PM To: [email protected] Subject: Xerces & libXml (was: Re: Was it something I said?) Thanks for this info, this is the sort of thing I've been trying to find without much success, a comparison of XML libraries describing their strengths and weaknesses at accomplishing certain tasks. If anyone knows where such a thing exists, or has experience to share, it would be much appreciated. We actually may have two conflicting sets of requirements. We provide a medical record/image archival/retrieval service and XML is being used to attach metadata to the records/images as part of the process. On the front end where the XML is being generated, and analyzed in detail upon retrieval/updating, the kind of robustness and interoperability you're talking about below is probably the prime consideration. But in the pipeline that transports this data back and forth, we only need to peek into the XML for a few tags to figure out what kind of record/image it is, what destination it needs to be sent to, and so on. Because we are moving enormous amounts of data, speed is a critical factor because of end-user response time requirements. We may have to consider using both if libxml doesn't properly support the more sophisticated stuff we may need to do on the front end. It's definitely faster, initial results indicate about 3x, and therefore the leading candidate for the back end and is small/light enough for what we want to do with the configuration files. We'll have to see about the front end stuff. Thanks again for your insights, Jesse. -will -----Original Message----- From: Jesse Pelton [mailto:[EMAIL PROTECTED] Sent: Monday, December 18, 2006 8:57 AM To: [email protected] Subject: RE: Was it something I said? (was: proper way to use Xerces for what I'm trying to do) Libxml is a great library with somewhat different goals than Xerces. I don't think it's explicitly stated on the Web site, but Xerces and other projects that build on it tend to implement W3C standards (DOM, XML Schema), while libxml implements what its maintainer prefers (a unique API, RelaxNG), with a focus on efficiency. Both approaches are reasonable, and which is appropriate depends on your needs. In your shoes, if I were certain that lighting a cigarette is all I would ever need to do, I'd probably use libxml. In my experience, though, XML is useful for so many things that I'd probably want to be prepared to bake, boil, weld, and power fighter jets as well - in a variety of local languages. I'm a nut for portability, and a DOM interface has the advantage of being similar or identical in a wide range of environments (C++, C#, JavaScript, etc). -----Original Message----- From: Will Sappington [mailto:[EMAIL PROTECTED] Sent: Friday, December 15, 2006 5:21 PM To: [email protected] Subject: RE: Was it something I said? (was: proper way to use Xerces for what I'm trying to do) Turns out XPath will do exactly what I need to do - select a unique element (configuration item) from the file. We're going to XML for our configuration files because it lets you do more sophisticated stuff than a flat ini file parser can do. At this point it's sorta like using a blowtorch to light a cigarette, but there will likely be a need for it in the future. Problem is, Xerces doesn't support XPath on its own, you have to bolt Xalan or Pathan on top of it to do XPath. Now the blowtorch has become a flame thrower. Since XML will be used throughout our system, not just for config files, and speed is critical in some places, we've decided to look at "libxml" instead. It supports XPath on its own, has a fast XMLTextReader interface if you want to do a simple parse without building the whole DOM tree, and in general, looks like it will be better suited to what we're doing. We haven't decided on which one yet, but libxml looks pretty good from what I've done with it so far. If anyone has experience or knowledge of both and can shed some light on the advantages/disadvantages of either, I'd appreciate the insight. -will -----Original Message----- From: Jesse Pelton [mailto:[EMAIL PROTECTED] Sent: Friday, December 15, 2006 3:47 PM To: [email protected] Subject: RE: Was it something I said? (was: proper way to use Xerces for what I'm trying to do) I can hazard a guess as to why I didn't respond at the time, based on my re-reading of the message. My reaction to it just now was, "Hmm, interesting question. I haven't done anything quite like that, and it looks like it would take at least 15 or 20 minutes to craft a useful response. Maybe someone else will pipe up." If you get a list full of people with similar responses, it looks like your message has been completely ignored, though in fact it hasn't. It's been read, but not responded to. (In my experience, subscribers are reluctant to make replies like, "Sorry, can't help," unless they think that nobody on the list can help because you're barking up the wrong tree.) I suppose someone could have said, "You seem to be on the right track. Good luck." That would have been courteous, but bear in mind that this list is an unorganized (distinct, I hope from disorganized) collection of individuals who contribute for a variety of reasons, and there's no one who oversees it all with an eye toward courtesy or any other coherent behavior. I don't think you can attribute any particular motivation to such a diverse group. So, a belated welcome aboard. I don't think you were rude, and I doubt anyone else intended to be. ;-) -----Original Message----- From: Will Sappington [mailto:[EMAIL PROTECTED] Sent: Friday, December 15, 2006 3:30 PM To: [email protected] Subject: Was it something I said? (was: proper way to use Xerces for what I'm trying to do) I'm curious. Almost 2 weeks ago I submitted the question below and got no response at all. I thought "OK, this must have been too much of a basic noob question, they probably want me to go do some more homework and come back when I can ask a more proper question". I didn't think the question was all that basic, I was asking about an overall approach to using Xerces to solve a problem, not how to install it, set up my build environment, or write the code. So I went ahead and figured out what I needed on my own. Recently another new user has joined the group and has been asking for and getting help on some pretty basic stuff like letting up VC++ with the proper include paths for building and got a step-by-step "click this, press that" response. I don't begrudge him the help he has been getting in the slightest, in fact I'm glad to see that the group will actually offer help at that level. My only question is "Why was my question completely ignored?" Did I say something offensive, violate some rule of etiquette, something like that? Just curious. -will -----Original Message----- From: Will Sappington Sent: Monday, December 04, 2006 11:15 AM To: '[email protected]' Subject: proper way to use Xerces for what I'm trying to do I'm new to both XML and Xerces, literally starting from scratch. From what I've seen in the documentation and some of the posts here, Xerces provides a lot of capabilities and there may be more than one way to accomplish a particular task. I'm trying to make sense of everything and I have an idea for an approach to the task I've been assigned, but I have no idea if it's a good approach or even the right way to do it. I'm hoping that someone here can help me so I don't' spend too much time going down dead-end paths. Here's what I'm trying to do. Some of our applications are configured with a hierarchical .ini file. There are 3 levels - application, section (within an application), and item (within a section). Users of the configuration utility class call a method getItem(appID, sectionName, itemName) to retrieve the value of the requested item. The .ini files are pretty standard flat text files with configuration items specified as name/val pairs. We'd like to use XML instead of flat .ini files. I've gotten Xerces to build and run with the configuration utility, so far I can do pretty simple stuff like getElementsByName() and walk node lists, get their lengths and so on. I figure I can find a specific item by walking node lists, but what I'd really like to have is a way of directly accessing a unique element using its application/section/item names. Someone here (my office) asked if I had tried an "xpath", I don't know what that is, I'm gonna go find out, but I'm concerned that there may be an approach using XML that is fundamentally different than how you go about this using a flat file. That's what I'm looking to find out first, if there's a general "best" way to do this sort of thing, and then, whether yes or no, the specifics of how to implement it. Any help will be greatly appreciated. -will
