On Wed, Dec 19, 2012 at 2:44 AM, thomasg <tho...@gstaedtner.net> wrote:

> On Wed, Dec 19, 2012 at 5:18 AM, Gustavo Sverzut Barbieri <
> barbi...@profusion.mobi> wrote:
>
> > On Wednesday, December 19, 2012, thomasg wrote:
> >
> > > On Wed, Dec 19, 2012 at 4:38 AM, Gustavo Sverzut Barbieri <
> > > barbi...@profusion.mobi <javascript:;>> wrote:
> > >
> > > > Hi Thomas,
> > > >
> > > > The standard way is pretty fast and lean, but it is a SAX-like
> parser.
> > > That
> > > > mean you only get tokens, for the tags you need to call yet another
> > > > function to split the tag and arguments.
> > > >
> > > > It is good enough to parse svg, as done by Esvg. Should be also
> enough
> > to
> > > > parse config files and your chat.xml
> > > >
> > > > There is also a version trust creates nodes from XML. It's useful to
> > > debug
> > > > and for simple cases without performance worries. As very likely you
> > will
> > > > store your parsed data in a custom structure than a generic "Dom", I
> > > > recommend using the sax version.
> > > >
> > > > I didn't try the example with your XML, but seems to be okay. The
> > example
> > > > could use eina_strbuf instead of array of strings, but that's
> marginal.
> > > > Also could use the size and avoid strncmp(), but also marginal for an
> > > > example.
> > > >
> > > > What is exactly failing?
> > > >
> > >
> > > As you can see, the tags are totally wrong.
> > > They are neither corretly aligned (a <foo> can be closed with </bar>
> and
> > > not just </foo>), nor do the items correspond with the tags.
> > > So if the input is not 100% like the parser expects it, say there's an
> > > additional level, the parser won't fail but just receive totally wrong
> > > data.
> > > If I want to make sure that I get the date from tag <baz>DATA</baz>, I
> > have
> > > to manually compare the string and it seems that I might as well just
> > parse
> > > it myself alltogether.
> >
> >
> > That is always the case with sax. It allows you to handle errors
> yourself,
> > like abort, auto fix, etc. like parsing bogus HTML that is common in the
> > Internet.
> >
> > I don't recall how strict I was with the tree/node version, I guess to
> make
> > it usable by Evas textblock u can close tags with </>, but not sure if
> you
> > specify an incorrect close tag what it would do. Anyway I'd recommend a
> > final version to avoid the intermediate node tree and use sax directly,
> > then you get more eficient data structures.
> >
> > Also consider always using the size. The original buffer is not modified,
> > then strings will not be null terminated.
> >
> > Usually the sax parser will keep a stack, and you can validate based in
> > that. But just validate if data is untrusted. Same for attributes, you
> just
> > pay the price if you expect them for such tag. IOW it can be very
> > efficient.
> >
> > The added benefit of using it over manual parse is that it will handle
> > whitespaces and also do minimal tag boundary match. If > is missing, etc.
> > that will emit errors.
> >
> >
> Hm, I guess I had/have some misconceptions on how a SAX parser was supposed
> to work.
> It just seemed like a terrible idea to just take the data as it comes while
> ignoring half of it.
>

SAX is much like a tokenizer. However, most will handle you new strings
(either strdup() or modifying the input buffer) with the actual tag. It's a
bit easier than what I did in eina's, but that one is faster and lighter on
memory. But it means you must consider the "size" argument when you get it.

The benefits of using a SAX parser is when you have those large config
files that are composed of just tags, without arguments, and contents:

    <config>
        <item>
            <key>bla</key>
            <value>xyz</value>
        </item>
    </config>

you can create a list/array of My_Item structures with fields key and
value, if these are integers or enumerations it's pretty simple to see how
much fast it can be, zero string creation. :-)

If you need a more traditional approach, use eina_simple_xml_node_load()
http://docs.enlightenment.org/auto/eina/group__Eina__Simple__XML__Group.html#gadc951418424b679ea32ba63492894fe3and
eina_simple_xml_node_dump()

Test at
http://svn.enlightenment.org/svn/e/trunk/efl/src/tests/eina/eina_test_simple_xml_parser.c



> Then again, to me XML seems like a terrible idea in general :)
>


indeed. The reason of having eina_simple_xml is to avoid pulling in libxml2
and similar just to do some basic configuration parsing. Ideally someone
would convert Efreet's menu parser to use it.


-- 
Gustavo Sverzut Barbieri
http://profusion.mobi embedded systems
--------------------------------------
MSN: barbi...@gmail.com
Skype: gsbarbieri
Mobile: +55 (19) 9225-2202
------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Reply via email to