On 2021-05-27 23:35:32 +0000, Bruce  Johnson wrote:
> The column in question is a varchar2(4000) so I am truncating the
> input to 4000 bytes.
> 
> (via use “bytes; $string=substr($orig,0,4000); ” in the section where
> I actually truncate the string.)
> 
> When I do the insert I get an  ORA-12899: value too large for column
> "AWARD"."PRECEPT_NOMINATIONS"."ONCOLOGY_COMMENTS" (actual: 4054,
> maximum: 4000) error

Is $orig actually UTF-8 encoded at that time? If it only contains
codepoints <= U+00FF, it might be 1 byte/character, but when sent to
Oracle the codepoints between U+0080 and U+00FF will still be encoded as
two bytes. 

Something like:

    my $bs = encode_utf8($orig);
    my $sbs = substr($bs, 0, $len);
    my $truncated = decode_utf8($sbs, FB_QUIET);

would be safer.

Also, waht encoding do you use in Oracle? UTF8 isn't really UTF-8: It's
a weird double-encoding which may take up to 6 six bytes for non-BMP
characters. To get real UTF-8, use AL32UTF8.

        hp

-- 
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | h...@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment: signature.asc
Description: PGP signature

Reply via email to