On Mon, Feb 28, 2011 at 1:48 PM, Cedric BAIL <cedric.b...@free.fr> wrote: > On Mon, Feb 28, 2011 at 5:43 PM, Joerg Sonnenberger > <jo...@britannica.bec.de> wrote: >> On Mon, Feb 28, 2011 at 10:20:16AM -0300, Gustavo Sverzut Barbieri wrote: >>> I've worked with expat before and it's way more complex to use and >>> heavy. Sure, it will work and will handle namespace, and encoding... >>> but many times you don't want or need it. >> >> The point is that if you don't do that, you no longer have an XML >> parser. So don't even call it that. If you explicitly want to use only a >> subset,
I don't want that, it's intentional. As almost everybody in this project I hate XML to my deepest feelings, it's pointless, inefficient in space and parsing. But as expected just a sane amount of syntax is supported, even more for regular configuration, build system, rules or even HTML. How many HTML pages do you see declaring new entities? Of course parsing HTML with it it's better to do using the SAX so you can handle close-tags automatically as most people don't close things like <br> or <img>. >> going with JSON tends to be a much simpler option... No, if you have a choice, go with EET it's much simpler and efficient. >>> the current SAX-like api I'm calling is 1 single function that >>> receives a buffer and a callback, calls you back with pointers to the >>> buffer you handled it. It does not consider any form of encoding, thus >>> it will never break, it's up to you. It will fallback nicely on >>> unhandled conditions, like entity definitions are not handled, they >>> are given to you as an open tag statement. That is because MOST of >>> these files are ascii and do not use these xml nasty features such as >>> entities & like. >> >> That doesn't work either. XML can't be parsed encoding neutral. Consider >> documents in shift_jis for this. If you implement a fallback path to >> handle all well formed XML documents using a full blown parser, you >> haven't saved anything in terms of code complexity and the request for a >> benchmark made in this thread is completely valid to justify the >> *additional* complexity. Check out: /usr/share/hal/fdi/*/*.fdi and tell me what difference it would make. That's my problem with XML people, they can't tell the difference between theory and reality. In theory you can build all kinds of corner cases to prove me wrong, but reality shows that we can do just fine for what we need. Reality is that you just need to find < and >, with the exception of <![CDATA[ ... ]]>. Most people don't even use this cdata case. Most files, although declared as UTF-8 are actually ASCII, with non-ASCII converted to entities/escaped. If you can find out some case that providing real UTF-8 strings would break it, then I'll care to fix it. >>> Even the escaping ({ or &) is not handled, at least with efl >>> you're likely to not need it anyway as Evas already handles it for >>> you. >> >> This sounds like moving complexity to the wrong layer, too. Ignoring the >> question of whether a document editor should preserve entities or not, >> most of the users of a "simple" parser should see entities at all or >> have to deal with them. There is a good reason for wanting to use them >> by human editors. Again, any real use case? As for entities, checking for them is more harm than good: - you waste time looking for them; - you need to allocate memory to write the resulting bytes; - you now have a new problem: which encoding should I write to? If the document is in encoding ISO-8859-1, you'd need to convert it to UTF-8 before doing entities? But what if user wants to keep in ISO-8859-1? Do you convert back? What to do with unsupported chars in this set? - how about if your presentation handles entities for you? Like Evas/Edje/Elementary? You did all of the above for what good? Most of the times we'll be reading configuration files with it. Or results of RPC-XML calls. Usually you'll know for sure fields you could have them and what to replace. Example: if you're reading something that you'll turn into URL, then just for that field you can convert to %AB convention instead of converting to UTF-8 and then %AB format. >> In short: if it doesn't implement XML, it is not an XML parser. Most of >> the configuration files sadly using XML are exactly that. Providing a >> simplified interface is fine, it doesn't require throwing compatibility >> over board. If you don't want XML, consider something like Apple's >> proplib or just JSON. Don't retrofit it into existing file formats. I just name it XML as it's the name people will search our docs. Otherwise it's pointless as nobody will find. > We do need a XML parser for FreeDesktop files. They have a really > limited complexity and we can't change them. As for configuration > file, we do have eet, that does the job pretty well for us. Exactly, EET is the way to go for our controlled files. But system ships with xml files, but I dare you to showcase one ordinary file in your system that is not parseable with this one. FreeDesktop.org, Xorg, PolicyKit, HAL, Gconf... all should work fine. -- Gustavo Sverzut Barbieri http://profusion.mobi embedded systems -------------------------------------- MSN: barbi...@gmail.com Skype: gsbarbieri Mobile: +55 (19) 9225-2202 ------------------------------------------------------------------------------ Free Software Download: Index, Search & Analyze Logs and other IT data in Real-Time with Splunk. Collect, index and harness all the fast moving IT data generated by your applications, servers and devices whether physical, virtual or in the cloud. Deliver compliance at lower cost and gain new business insights. http://p.sf.net/sfu/splunk-dev2dev _______________________________________________ enlightenment-devel mailing list enlightenment-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/enlightenment-devel