Re: [E-devel] RFC: eina simple (and stupid) XML parser

Gustavo Sverzut Barbieri Mon, 28 Feb 2011 09:50:02 -0800

On Mon, Feb 28, 2011 at 1:48 PM, Cedric BAIL <cedric.b...@free.fr> wrote:
> On Mon, Feb 28, 2011 at 5:43 PM, Joerg Sonnenberger
> <jo...@britannica.bec.de> wrote:
>> On Mon, Feb 28, 2011 at 10:20:16AM -0300, Gustavo Sverzut Barbieri wrote:
>>> I've worked with expat before and it's way more complex to use and
>>> heavy. Sure, it will work and will handle namespace, and encoding...
>>> but many times you don't want or need it.
>>
>> The point is that if you don't do that, you no longer have an XML
>> parser. So don't even call it that. If you explicitly want to use only a
>> subset,


I don't want that, it's intentional. As almost everybody in this
project I hate XML to my deepest feelings, it's pointless, inefficient
in space and parsing. But as expected just a sane amount of syntax is
supported, even more for regular configuration, build system, rules or
even HTML.

How many HTML pages do you see declaring new entities?  Of course
parsing HTML with it it's better to do using the SAX so you can handle
close-tags automatically as most people don't close things like <br>
or <img>.


>> going with JSON tends to be a much simpler option...

No, if you have a choice, go with EET it's much simpler and efficient.



>>> the current SAX-like api I'm calling is 1 single function that
>>> receives a buffer and a callback, calls you back with pointers to the
>>> buffer you handled it. It does not consider any form of encoding, thus
>>> it will never break, it's up to you. It will fallback nicely on
>>> unhandled conditions, like entity definitions are not handled, they
>>> are given to you as an open tag statement. That is because MOST of
>>> these files are ascii and do not use these xml nasty features such as
>>> entities & like.
>>
>> That doesn't work either. XML can't be parsed encoding neutral. Consider
>> documents in shift_jis for this. If you implement a fallback path to
>> handle all well formed XML documents using a full blown parser, you
>> haven't saved anything in terms of code complexity and the request for a
>> benchmark made in this thread is completely valid to justify the
>> *additional* complexity.

Check out: /usr/share/hal/fdi/*/*.fdi  and tell me what difference it
would make.

That's my problem with XML people, they can't tell the difference
between theory and reality. In theory you can build all kinds of
corner cases to prove me wrong, but reality shows that we can do just
fine for what we need.

Reality is that you just need to find < and >, with the exception of
<![CDATA[ ... ]]>. Most people don't even use this cdata case. Most
files, although declared as UTF-8 are actually ASCII, with non-ASCII
converted to entities/escaped. If you can find out some case that
providing real UTF-8 strings would break it, then I'll care to fix it.


>>> Even the escaping (&#123; or &amp;) is not handled, at least with efl
>>> you're likely to not need it anyway as Evas already handles it for
>>> you.
>>
>> This sounds like moving complexity to the wrong layer, too. Ignoring the
>> question of whether a document editor should preserve entities or not,
>> most of the users of a "simple" parser should see entities at all or
>> have to deal with them. There is a good reason for wanting to use them
>> by human editors.

Again, any real use case? As for entities, checking for them is more
harm than good:
   - you waste time looking for them;
   - you need to allocate memory to write the resulting bytes;
   - you now have a new problem: which encoding should I write to? If
the document is in encoding ISO-8859-1, you'd need to convert it to
UTF-8 before doing entities? But what if user wants to keep in
ISO-8859-1? Do you convert back? What to do with unsupported chars in
this set?
   - how about if your presentation handles entities for you? Like
Evas/Edje/Elementary? You did all of the above for what good?

Most of the times we'll be reading configuration files with it. Or
results of RPC-XML calls. Usually you'll know for sure fields you
could have them and what to replace. Example: if you're reading
something that you'll turn into URL, then just for that field you can
convert to %AB convention instead of converting to UTF-8 and then %AB
format.


>> In short: if it doesn't implement XML, it is not an XML parser. Most of
>> the configuration files sadly using XML are exactly that. Providing a
>> simplified interface is fine, it doesn't require throwing compatibility
>> over board. If you don't want XML, consider something like Apple's
>> proplib or just JSON. Don't retrofit it into existing file formats.

I just name it XML as it's the name people will search our docs.
Otherwise it's pointless as nobody will find.



> We do need a XML parser for FreeDesktop files. They have a really
> limited complexity and we can't change them. As for configuration
> file, we do have eet, that does the job pretty well for us.

Exactly, EET is the way to go for our controlled files.

But system ships with xml files, but I dare you to showcase one
ordinary file in your system that is not parseable with this one.
FreeDesktop.org, Xorg, PolicyKit, HAL, Gconf... all should work fine.


-- 
Gustavo Sverzut Barbieri
http://profusion.mobi embedded systems
--------------------------------------
MSN: barbi...@gmail.com
Skype: gsbarbieri
Mobile: +55 (19) 9225-2202

------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in 
Real-Time with Splunk. Collect, index and harness all the fast moving IT data 
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business 
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Re: [E-devel] RFC: eina simple (and stupid) XML parser

Reply via email to