Re: [gentoo-user] locating non utf-8 characters

Michael Tue, 03 Nov 2020 02:04:49 -0800

On Tuesday, 3 November 2020 05:46:41 GMT [email protected] wrote:
> I'm using sql-ledger and while making backup it uses stardard gzip program:
> $gzip = "gzip -S .gz";
> 
> The backup works with some dataset but one data set us giving me an
> error while trying to perform backup:
> 
> Wide character in print at SL/AM.pm line 2044.
> Content-Type: application/file; Content-Disposition: attachment;
> filename=dataset_3-3.2.6-20201101.sql.gz
> ãù&ü_1604265628.dataset_3-3.2.6-20201101.sqlÏ\Yo;ñ~œØ –
> 
> Since sql-ledger file are standard utf-8 files, I was thinking using:
>  grep -axv '.*' file
> 
> would find all not utf-8 characters. And it did. I use "nano" to remove
> them but I'm still getting the same error while performing backup.
> 
> Any ideas?


I have not used sql-ledger, but have come across the following two symptoms 
which may be relevant to your problem.

1. A SQL database which was created with an MSWindows application was using 
UTF-16 instead of UTF-8.  This added some UTF-16 null character at the start 
of the SQL dump which messed up the output.  The offending character was 
obvious as a block when inspecting the dump with 'less' in Linux with its 
default UTF-8 character encoding and could be deleted with a text editor.  I 
don't think this relates to your problem, but I am mentioning it for 
completeness.

2. The word "print" in the error reported gives a hint you should follow up.  
Perl which is used by sql-ledger, converts bytes to characters and can be set 
to use UTF-8 encoding.  However, it's conversion algorithm does not get things 
right every time and when it concatenates strings it can mistranslate them.  
You could fix this by setting both input *and* output encoding characters to 
UTF-8.  A good explanation of the problem and suitable solutions are described 
here:

https://www.ahinea.com/en/tech/perl-unicode-struggle.html

signature.asc
Description: This is a digitally signed message part.

Re: [gentoo-user] locating non utf-8 characters

Reply via email to