Re: [Bug-gnupod] Encoding of non-ascii characters in GNUtunesDB.xml

H. Langos Mon, 14 Apr 2008 11:13:23 -0700

Ok, here's the patch ...

Took longer than I thought because UTF8 in perl is a major pain.


cheers
-henrik

PS: The line "$xutf =~ tr/\000-\037//d;" is not without problems. It
will reduce all control characters to nothing including TAB, LF, 
and CR eventhough they are valid XML characters. 

Could somebody check out how iTunes handles those? Does it also remove 
those characters or does it convert them into &#9; and so on?


On Mon, Apr 14, 2008 at 02:14:18PM +0200, H. Langos wrote:
> Hi there,
> 
> I wonder If anybody else has the ocassional problem with editing her/his
> GNUtunesDB.xml. 
> 
> Since it is XML and the encoding is UTF-8 you don't have any problem as
> long as your system is completely UTF-8 compliant. I however have a
> mixed iso-8859-1 iso-8859-15 and UTF-8 mess and some of the editors 
> that I like to use are not very smart about handling the character 
> encoding.
> 
> It would be very easy to convert everything outsite the ascii range to 
> the XML escaped version. So say, instead of some garbage you'd see 
> "&#347;" where a "Latin Small Letter s with Acute" is.
> 
> Pro: GNUtunesDB.xml becomes a pure ascii file. No more editor/viewer 
>   issues.
> 
> Contra: The GNUtunesDB.xml becomes slightly bigger and for people with a
>   clean UTF-8 toolchain it becomes a little less readable. (Note: You can
>   still edit the file and insert native UTF-8 as you please.)
> 
> Any thoughts?
> 
> cheers
> -henrik
> 
> 
> 
> _______________________________________________
> Bug-gnupod mailing list
> [email protected]
> http://lists.nongnu.org/mailman/listinfo/bug-gnupod

commit 5ce6a9e9173dce95287ff4b15deda67b569dd365
Author: Heinrich Langos <[EMAIL PROTECTED]>
Date:   Mon Apr 14 19:49:54 2008 +0200

    Changed encoding of unicode characters outside of ascii range to XML notation.
    
    This change will make your GNUtunesDB.xml into a pure ascii file. Making it
    easier to view and manipulate on non-utf8 capable systems.
    
    Note: "xescaped()" is not only called for attribute values but also for
    element names and attribute names. So if sombody comes up with non-ascii
    element names or attribute names we would have to treat those differently.

diff --git a/src/ext/XMLhelper.pm b/src/ext/XMLhelper.pm
index 5eaeb48..2a230a3 100755
--- a/src/ext/XMLhelper.pm
+++ b/src/ext/XMLhelper.pm
@@ -124,8 +124,15 @@ sub xescaped {
 	my $xutf = Unicode::String::utf8($ret)->utf8;
 	#Remove 0x00 - 0x1f chars (we don't need them)
 	$xutf =~ tr/\000-\037//d;
-	
-	return $xutf;
+	my $out = Unicode::String::utf8("")->utf8;
+	for (my $i = 0 ; $i < Unicode::String::utf8($xutf)->length ; $i++) {
+		if (Unicode::String::utf8($xutf)->substr($i,1)->ord > 127) {
+			$out .= '&#' . Unicode::String::utf8($xutf)->substr($i,1)->ord . ';';
+		} else {
+			$out .= Unicode::String::utf8($xutf)->substr($i,1) ;
+		}
+	}
+	return $out;
 }

_______________________________________________
Bug-gnupod mailing list
[email protected]
http://lists.nongnu.org/mailman/listinfo/bug-gnupod

Re: [Bug-gnupod] Encoding of non-ascii characters in GNUtunesDB.xml

Reply via email to