On Tuesday, 3 November 2020 05:46:41 GMT [email protected] wrote: > I'm using sql-ledger and while making backup it uses stardard gzip program: > $gzip = "gzip -S .gz"; > > The backup works with some dataset but one data set us giving me an > error while trying to perform backup: > > Wide character in print at SL/AM.pm line 2044. > Content-Type: application/file; Content-Disposition: attachment; > filename=dataset_3-3.2.6-20201101.sql.gz > ãù&ü_1604265628.dataset_3-3.2.6-20201101.sqlÏ\Yo;ñ~œØ – > > Since sql-ledger file are standard utf-8 files, I was thinking using: > grep -axv '.*' file > > would find all not utf-8 characters. And it did. I use "nano" to remove > them but I'm still getting the same error while performing backup. > > Any ideas?
I have not used sql-ledger, but have come across the following two symptoms which may be relevant to your problem. 1. A SQL database which was created with an MSWindows application was using UTF-16 instead of UTF-8. This added some UTF-16 null character at the start of the SQL dump which messed up the output. The offending character was obvious as a block when inspecting the dump with 'less' in Linux with its default UTF-8 character encoding and could be deleted with a text editor. I don't think this relates to your problem, but I am mentioning it for completeness. 2. The word "print" in the error reported gives a hint you should follow up. Perl which is used by sql-ledger, converts bytes to characters and can be set to use UTF-8 encoding. However, it's conversion algorithm does not get things right every time and when it concatenates strings it can mistranslate them. You could fix this by setting both input *and* output encoding characters to UTF-8. A good explanation of the problem and suitable solutions are described here: https://www.ahinea.com/en/tech/perl-unicode-struggle.html
signature.asc
Description: This is a digitally signed message part.

