On Wed, Dec 19, 2012 at 2:44 AM, thomasg <tho...@gstaedtner.net> wrote:
> On Wed, Dec 19, 2012 at 5:18 AM, Gustavo Sverzut Barbieri < > barbi...@profusion.mobi> wrote: > > > On Wednesday, December 19, 2012, thomasg wrote: > > > > > On Wed, Dec 19, 2012 at 4:38 AM, Gustavo Sverzut Barbieri < > > > barbi...@profusion.mobi <javascript:;>> wrote: > > > > > > > Hi Thomas, > > > > > > > > The standard way is pretty fast and lean, but it is a SAX-like > parser. > > > That > > > > mean you only get tokens, for the tags you need to call yet another > > > > function to split the tag and arguments. > > > > > > > > It is good enough to parse svg, as done by Esvg. Should be also > enough > > to > > > > parse config files and your chat.xml > > > > > > > > There is also a version trust creates nodes from XML. It's useful to > > > debug > > > > and for simple cases without performance worries. As very likely you > > will > > > > store your parsed data in a custom structure than a generic "Dom", I > > > > recommend using the sax version. > > > > > > > > I didn't try the example with your XML, but seems to be okay. The > > example > > > > could use eina_strbuf instead of array of strings, but that's > marginal. > > > > Also could use the size and avoid strncmp(), but also marginal for an > > > > example. > > > > > > > > What is exactly failing? > > > > > > > > > > As you can see, the tags are totally wrong. > > > They are neither corretly aligned (a <foo> can be closed with </bar> > and > > > not just </foo>), nor do the items correspond with the tags. > > > So if the input is not 100% like the parser expects it, say there's an > > > additional level, the parser won't fail but just receive totally wrong > > > data. > > > If I want to make sure that I get the date from tag <baz>DATA</baz>, I > > have > > > to manually compare the string and it seems that I might as well just > > parse > > > it myself alltogether. > > > > > > That is always the case with sax. It allows you to handle errors > yourself, > > like abort, auto fix, etc. like parsing bogus HTML that is common in the > > Internet. > > > > I don't recall how strict I was with the tree/node version, I guess to > make > > it usable by Evas textblock u can close tags with </>, but not sure if > you > > specify an incorrect close tag what it would do. Anyway I'd recommend a > > final version to avoid the intermediate node tree and use sax directly, > > then you get more eficient data structures. > > > > Also consider always using the size. The original buffer is not modified, > > then strings will not be null terminated. > > > > Usually the sax parser will keep a stack, and you can validate based in > > that. But just validate if data is untrusted. Same for attributes, you > just > > pay the price if you expect them for such tag. IOW it can be very > > efficient. > > > > The added benefit of using it over manual parse is that it will handle > > whitespaces and also do minimal tag boundary match. If > is missing, etc. > > that will emit errors. > > > > > Hm, I guess I had/have some misconceptions on how a SAX parser was supposed > to work. > It just seemed like a terrible idea to just take the data as it comes while > ignoring half of it. > SAX is much like a tokenizer. However, most will handle you new strings (either strdup() or modifying the input buffer) with the actual tag. It's a bit easier than what I did in eina's, but that one is faster and lighter on memory. But it means you must consider the "size" argument when you get it. The benefits of using a SAX parser is when you have those large config files that are composed of just tags, without arguments, and contents: <config> <item> <key>bla</key> <value>xyz</value> </item> </config> you can create a list/array of My_Item structures with fields key and value, if these are integers or enumerations it's pretty simple to see how much fast it can be, zero string creation. :-) If you need a more traditional approach, use eina_simple_xml_node_load() http://docs.enlightenment.org/auto/eina/group__Eina__Simple__XML__Group.html#gadc951418424b679ea32ba63492894fe3and eina_simple_xml_node_dump() Test at http://svn.enlightenment.org/svn/e/trunk/efl/src/tests/eina/eina_test_simple_xml_parser.c > Then again, to me XML seems like a terrible idea in general :) > indeed. The reason of having eina_simple_xml is to avoid pulling in libxml2 and similar just to do some basic configuration parsing. Ideally someone would convert Efreet's menu parser to use it. -- Gustavo Sverzut Barbieri http://profusion.mobi embedded systems -------------------------------------- MSN: barbi...@gmail.com Skype: gsbarbieri Mobile: +55 (19) 9225-2202 ------------------------------------------------------------------------------ LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d _______________________________________________ enlightenment-devel mailing list enlightenment-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/enlightenment-devel