xml - Bug 710824 - GnuCash should sanitise UTF-8 before serialising files

John Ralls Thu, 26 Dec 2013 07:54:15 -0800

On Dec 26, 2013, at 5:41 AM, Derek Atkins <[email protected]> wrote:

> John Ralls <[email protected]> writes:
> 
>>>> Bug 710824 - GnuCash should sanitise UTF-8 before serialising files
>>>> 
>>>> to avoid writing broken unparseable XML.
>>>> This checks for both bad UTF8 and for invalid control characters
>>>> that libxml2 doesn't convert to entities.
>>> 
>>> Are we going to need a similar process for the SQL backend?
>>> 
>> 
>> I don’t think so. SQL won’t refuse to load a database because one
>> field has a character that doesn’t match some spec. In fact, it
>> doesn’t much care what you put into it; as far as the DB is concerned,
>> bytes is bytes.
> 
> Potentially true for the current set of databases, but it does mean that
> if you go from SQL -> XML -> SQL then the resulting second SQL will not
> be the same as the first.


Well, there are two "right" solutions: One is to get libxml2 to convert those 
characters into entities. I'll see if there's already a bug for that and file 
one if there isn't. The other is to filter them out at input, which I've 
already done for OFX import. I can't think of a use-case where those characters 
would be useful in one of our fields. That should be extracted into an input 
module that's called by everything that brings in text from outside of GnuCash, 
including the GUI. After all, bug 710824 itself probably was caused by a 
copy-and-paste error.

Regards,
John Ralls



_______________________________________________
gnucash-devel mailing list
[email protected]
https://lists.gnucash.org/mailman/listinfo/gnucash-devel

Re: r23598 - gnucash/trunk/src/backend/xml - Bug 710824 - GnuCash should sanitise UTF-8 before serialising files

Reply via email to