Can anyone who also uses Ruby enlighten me? For benchmarking purposes
this Perl 5.16 script works fine parsing a large Maildir folder:
use 5.016;
use autodie;
my $dir = 'my/mail/path';
chdir $dir;
opendir my $dh, $dir;
while
You can use the ruby String#encode method to force UTF-8 encoding on the
string and have invalid byte sequences replaced. At a guess your perl code
is happy with the invalid sequence because it's not treating the string as
unicode at all. I'd expect it to fail in the same way if you force the
Quoting gvim gvi...@gmail.com:
Can anyone who also uses Ruby enlighten me? For benchmarking
purposes this Perl 5.16 script works fine parsing a large Maildir
folder:
use 5.016;
use autodie;
my $dir = 'my/mail/path';
chdir $dir;
On Thu, Aug 22, 2013 at 8:39 AM, gvim gvi...@gmail.com wrote:
The problematic mail file doesn't display any non-ASCII characters when
opened in Vim. Here's the Ruby 2.0 error message:
How about when you hexdump it?
On 22/08/2013 16:59, Dave Cross wrote:
Without seeing your data (or knowing anything much about Ruby's
string-handling) I'd guess that your file is in one of the extended
ASCII character sets (probably ISO-8859-1 or cp1252). You haven't told
Perl to decode the data in any way, so it's just
What problematic char? Why not just tell Ruby your strings are Latin-1? BTW
Latin-1 is not ASCII. If your data really *was* ASCII (a 7-bit charset), as
you had claimed, it would also be perfectly valid UTF-8.
To be clear, Ruby is correct, but if you tell it your data isn't in the
encoding it
On 22/08/2013 17:05, Paul Makepeace wrote:
How about when you hexdump it?
I wouldn't know but here's the result of hexdump -C (literal text
removed from line end):
58 2d 4d 6f 7a 69 6c 6c 61 2d 4b 65 79 73 3a 20
0010 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
*
Quoting gvim gvi...@gmail.com:
On 22/08/2013 17:05, Paul Makepeace wrote:
How about when you hexdump it?
I wouldn't know but here's the result of hexdump -C (literal text
removed from line end):
0560 75 67 68 74 20 66 6f 72 20 75 6e 64 65 72 20 a3
There's a pound sign at the
On Thu, Aug 22, 2013 at 9:15 AM, gvim gvi...@gmail.com wrote:
On 22/08/2013 17:05, Paul Makepeace wrote:
How about when you hexdump it?
I wouldn't know but here's the result of hexdump -C (literal text removed
from line end):
You're looking for high bits in the characters, as a first