Patrick, Thanks for the patch -- I'd actually tried a similar change; I found, though, when tracing through the code, that the encode/decode routines weren't even getting called, so I think the problem is also in how I'm passing data between perl and java (currently all arguments are encapsulated in an array that contains a hash table).
I'll let you know what happens when I get it fixed! Thanks again, dave >>>>> "Patrick" == Patrick LeBoutillier <[EMAIL PROTECTED]> writes: Patrick> Here's a patch I think will work (I did some minimal Patrick> testing and it worked out ok) : Patrick> RCS file: Patrick> /cvsroot/inline-java/Inline-Java/Java/Protocol.pm,v Patrick> retrieving revision 1.33 diff -r1.33 Protocol.pm 413c413 Patrick> < return join(".", unpack("C*", $s)) ; --- >> return join(".", unpack("U*", $s)) ; Patrick> 420c420 < return pack("C*", split(/\./, $s)) ; --- >> return pack("U*", split(/\./, $s)) ; Patrick> and Patrick> RCS file: Patrick> /cvsroot/inline-java/Inline-Java/Java/sources/InlineJavaProtocol.java,v Patrick> retrieving revision 1.2 diff -r1.2 Patrick> InlineJavaProtocol.java 614,615c614,615 < byte b[] = Patrick> {(byte)Integer.parseInt(ss)} ; < sb.append(new String(b)) Patrick> ; --- >> char c = (char)Integer.parseInt(ss) ; sb.append(new String(new >> char [] {c})) ; Patrick> 623c623,624 < byte b[] = s.getBytes() ; --- >> char c[] = new char[s.length()] ; s.getChars(0, c.length, c, 0) >> ; Patrick> 625c626 < for (int i = 0 ; i < b.length ; i++){ --- >> for (int i = 0 ; i < c.length ; i++){ Patrick> 629c630,631 < sb.append(String.valueOf(b[i])) ; --- >> sb.append((int)c[i]) ; Patrick> Let me know how it turns out. Patrick> Patrick Patrick> --------------------- Patrick LeBoutillier Laval, Quebec, Patrick> Canada ----- Original Message ----- From: "Patrick Patrick> LeBoutillier" <[EMAIL PROTECTED]> To: Patrick> <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Thursday, Patrick> June 05, 2003 8:28 AM Subject: Re: Inline::Java and utf8 >> Dave, >> >> It's possible that there is a problem here. Inline::Java uses a >> very Patrick> simple >> (and somewhat inefficient) encoding to pass the data between >> Perl and Patrick> Java. >> Here is the corresponding code: >> >> sub encode { my $s = shift ; >> >> return join(".", unpack("C*", $s)) ; } >> >> and >> >> String Decode(String s){ StringTokenizer st = new >> StringTokenizer(s, ".") ; StringBuffer sb = new StringBuffer() >> ; while (st.hasMoreTokens()){ String ss = st.nextToken() ; byte >> b[] = {(byte)Integer.parseInt(ss)} ; sb.append(new String(b)) ; >> } >> >> >> It breaks up the string byte by byte and reconstructs it on the >> other Patrick> side. >> It's probable that this doesn't work with multibyte characters >> since it's probably creating a character for Patrick> each >> byte. >> >> If you have time to check this out and send me a patch that >> would be Patrick> great, >> but I don't have the time currently to investigate this. I have >> no problem reviewing the encoding completely, I did like this >> to make sure I could implement the protocol line by line. Maybe >> only escaping the \n's would Patrick> have >> been sufficient. >> >> Anyways comments/suggestions are welcome. >> >> >> --------------------- Patrick LeBoutillier Laval, Quebec, >> Canada ----- Original Message ----- From: "Dave LaMacchia" >> <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, June >> 04, 2003 9:19 PM Subject: Inline::Java and utf8 >> >> >> > >> > I'm working on some code that uses Inline::Java to parse user >> input in > order to make calls to a corba interface in front of >> an oracle > database. >> > >> > I found when I fetch utf8 data from the database, all is well >> > (assuming I've set my locale -- this is on Solaris 2.8 -- to >> > en_US.UTF-8). When I go the other way, however, passing data >> from > perl to Java via Inline, I get data corruption in the >> non-ASCII > characters. >> > >> > I thought that I might have to convert the strings to UCS2, >> since > that's what Java uses internally, but this results in >> java errors due > to embedded null characters. >> > >> > Has anyone run into this problem before? Any suggestions how >> to get > around it? I'm using perl 5.8 so I shouldn't have to >> insert a use > utf8 pragma. Note also that I've confirmed the >> data is correct in the > perl code before the embedded Java is >> called. >> > >> > Thanks! >> > >> > --dave >> > >> -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Dave LaMacchia http://www.sleepwalk.org/ [EMAIL PROTECTED]