Unicode

Craig Chant Thu, 04 Jul 2013 01:59:00 -0700

Thanks Anthony,

I will try your suggestions when I'm back in the office tomorrow, I'm on a 
study day today for my OU course.

Would be good to track this down and fix without having to refactor to 
Win32::ODBC , I eventually want to look at replacing my own DBI wrapper for 
DBIC ORM but am concerned this wouldn't be possible if I cant get DBI to play 
ball with MS.

>> Your data is being stored in Unicode data typed columns right?

Yes it's NVARCHAR(max) , which I understood is MS's data-typing for uNicode 
VARiable CHARacters, looking at some sample column data via the Windows SQL 
Management GUI, it appears to display ok.

I know that the data being pasted into it is coming from an MS Access front end 
application that is linked to the same backend SQL server.

I also know that this is a memo / rich text input box control on the form 
(view) bound directly to the table column via a linked table definition with 
the backend SQL server and some of what they enter they copy/paste from emails 
and MS Word documents (and possibly PDF)

I can't see any odd characters looking at a small amount of sample data on the 
SQL server, and the data comes out of Win32::ODBC looking ok too.

>From what I can tell the data is in Unicode during capture and storage, it's 
>just the retrieval with DBI  where it seems to be breaking down.

I have to include a longread setting when using DBI::DBD::ODBC with SQL  
already, otherwise it falls over with the data being to long, so perhaps there 
is another parameter I need?

I really appreciate all the help you guys have given so far, thank you.

Regards,

Craig

________________________________
From: Anthony Lucas [anthonyjlu...@gmail.com]
Sent: 04 July 2013 01:09
To: The elegant MVC web framework
Subject: Re: [Catalyst] CSV / UTF-8 / Unicode

On 3 July 2013 11:18, Craig Chant 
<cr...@homeloanpartnership.com<mailto:cr...@homeloanpartnership.com>> wrote:

>> Maybe write a standalone test and take Catalyst and browser quirks out of 
>> the picture.

I have already done this, I have two SQL wrapper modules one that uses 
DBI::DBD::ODBC and one that uses Win32::ODBC, I applied it to the same 
standalone script that produces CSV output, the only difference between the 
test was one test accessed SQL with the DBI SQL wrapper and one test accessed 
SQL with the Win32::ODBC SQL wrapper, DBI outputted junk chars, Win32::ODBC 
didn't. What else should I be doing to test for the culprit of the corruption?

You need to see how they are using the ODBC API underneath for handling the 
data and encoding.
Setting the trace flag on DBI (i.e. DBI->trace(n)) will expose the DBD::ODBC 
activity. I'm not sure of the debugging available for Win32::ODBC.

One thing I would check first is what they are treating the column data as. If 
DBD::ODBC is treating the columns as WCHAR but Win32::ODBC is treating them as 
CHAR and then doing extra "magic" decoding (or not), well then you've found a 
big clue. There has to be different handling or differing levels of ODBC 
support somewhere.

I would assume that DBD::ODBC is doing "the right thing", and something else is 
amiss upstream (but well, never assume with Unicode handling, so make sure with 
the trace).

>> Also, you are aware that your data will probably be coming back as UCS2 if 
>> you're using SQL Server right?

No, what is UCS2 and is this handled differently in DBI::DBD::ODBC vs 
Win32::ODBC ?

>From what I understand, is ultimately what you've got happening?:
Original Input Data -> SQL Client -> Database Driver -> Database (UCS2) -> 
Windows ODBC Driver -> DBD::ODBC -> Catalyst(?)

If so, since you're storing the data as Unicode and the database driver knows 
this (because your column type is NVARCHAR etc.), conversion to UCS2 happens at 
the driver stage on Windows. This is lossless between the different Unicodes, 
so just make sure your input is actual good Unicode up to that point and your 
data is being stored correctly.

Your data is being stored in Unicode data typed columns right?

This Email and any attachments contain confidential information and is intended 
solely for the individual to whom it is addressed. If this Email has been 
misdirected, please notify the author as soon as possible. If you are not the 
intended recipient you must not disclose, distribute, copy, print or rely on 
any of the information contained, and all copies must be deleted immediately. 
Whilst we take reasonable steps to try to identify any software viruses, any 
attachments to this e-mail may nevertheless contain viruses, which our 
anti-virus software has failed to identify. You should therefore carry out your 
own anti-virus checks before opening any documents. HomeLoan Partnership will 
not accept any liability for damage caused by computer viruses emanating from 
any attachment or other document supplied with this e-mail. HomeLoan 
Partnership reserves the right to monitor and archive all e-mail communications 
through its network. No representative or employee of HomeLoan Partnership has 
the authority to enter into any contract on behalf of HomeLoan Partnership by 
email. HomeLoan Partnership is a trading name of H L Partnership Limited, 
registered in England and Wales with Registration Number 5011722. Registered 
office: 26-34 Old Street, London, EC1V 9QQ. H L Partnership Limited is 
authorised and regulated by the Financial Conduct Authority.

_______________________________________________
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/

RE: [Catalyst] CSV / UTF-8 / Unicode

Reply via email to