Patrick,
Thanks for the patch -- I'd actually tried a similar change; I found,
though, when tracing through the code, that the encode/decode routines
weren't even getting called, so I think the problem is also in how I'm
passing data between perl and java (currently all arguments are
encapsulated in an array that contains a hash table).
I'll let you know what happens when I get it fixed!
Thanks again,
dave
>>>>> "Patrick" == Patrick LeBoutillier <[EMAIL PROTECTED]> writes:
Patrick> Here's a patch I think will work (I did some minimal
Patrick> testing and it worked out ok) :
Patrick> RCS file:
Patrick> /cvsroot/inline-java/Inline-Java/Java/Protocol.pm,v
Patrick> retrieving revision 1.33 diff -r1.33 Protocol.pm 413c413
Patrick> < return join(".", unpack("C*", $s)) ; ---
>> return join(".", unpack("U*", $s)) ;
Patrick> 420c420 < return pack("C*", split(/\./, $s)) ; ---
>> return pack("U*", split(/\./, $s)) ;
Patrick> and
Patrick> RCS file:
Patrick> /cvsroot/inline-java/Inline-Java/Java/sources/InlineJavaProtocol.java,v
Patrick> retrieving revision 1.2 diff -r1.2
Patrick> InlineJavaProtocol.java 614,615c614,615 < byte b[] =
Patrick> {(byte)Integer.parseInt(ss)} ; < sb.append(new String(b))
Patrick> ; ---
>> char c = (char)Integer.parseInt(ss) ; sb.append(new String(new
>> char [] {c})) ;
Patrick> 623c623,624 < byte b[] = s.getBytes() ; ---
>> char c[] = new char[s.length()] ; s.getChars(0, c.length, c, 0)
>> ;
Patrick> 625c626 < for (int i = 0 ; i < b.length ; i++){ ---
>> for (int i = 0 ; i < c.length ; i++){
Patrick> 629c630,631 < sb.append(String.valueOf(b[i])) ; ---
>> sb.append((int)c[i]) ;
Patrick> Let me know how it turns out.
Patrick> Patrick
Patrick> --------------------- Patrick LeBoutillier Laval, Quebec,
Patrick> Canada ----- Original Message ----- From: "Patrick
Patrick> LeBoutillier" <[EMAIL PROTECTED]> To:
Patrick> <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Thursday,
Patrick> June 05, 2003 8:28 AM Subject: Re: Inline::Java and utf8
>> Dave,
>>
>> It's possible that there is a problem here. Inline::Java uses a
>> very
Patrick> simple
>> (and somewhat inefficient) encoding to pass the data between
>> Perl and
Patrick> Java.
>> Here is the corresponding code:
>>
>> sub encode { my $s = shift ;
>>
>> return join(".", unpack("C*", $s)) ; }
>>
>> and
>>
>> String Decode(String s){ StringTokenizer st = new
>> StringTokenizer(s, ".") ; StringBuffer sb = new StringBuffer()
>> ; while (st.hasMoreTokens()){ String ss = st.nextToken() ; byte
>> b[] = {(byte)Integer.parseInt(ss)} ; sb.append(new String(b)) ;
>> }
>>
>>
>> It breaks up the string byte by byte and reconstructs it on the
>> other
Patrick> side.
>> It's probable that this doesn't work with multibyte characters
>> since it's probably creating a character for
Patrick> each
>> byte.
>>
>> If you have time to check this out and send me a patch that
>> would be
Patrick> great,
>> but I don't have the time currently to investigate this. I have
>> no problem reviewing the encoding completely, I did like this
>> to make sure I could implement the protocol line by line. Maybe
>> only escaping the \n's would
Patrick> have
>> been sufficient.
>>
>> Anyways comments/suggestions are welcome.
>>
>>
>> --------------------- Patrick LeBoutillier Laval, Quebec,
>> Canada ----- Original Message ----- From: "Dave LaMacchia"
>> <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, June
>> 04, 2003 9:19 PM Subject: Inline::Java and utf8
>>
>>
>> >
>> > I'm working on some code that uses Inline::Java to parse user
>> input in > order to make calls to a corba interface in front of
>> an oracle > database.
>> >
>> > I found when I fetch utf8 data from the database, all is well
>> > (assuming I've set my locale -- this is on Solaris 2.8 -- to
>> > en_US.UTF-8). When I go the other way, however, passing data
>> from > perl to Java via Inline, I get data corruption in the
>> non-ASCII > characters.
>> >
>> > I thought that I might have to convert the strings to UCS2,
>> since > that's what Java uses internally, but this results in
>> java errors due > to embedded null characters.
>> >
>> > Has anyone run into this problem before? Any suggestions how
>> to get > around it? I'm using perl 5.8 so I shouldn't have to
>> insert a use > utf8 pragma. Note also that I've confirmed the
>> data is correct in the > perl code before the embedded Java is
>> called.
>> >
>> > Thanks!
>> >
>> > --dave
>> >
>>
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Dave LaMacchia http://www.sleepwalk.org/
[EMAIL PROTECTED]