Laurent-- Thanks for the quick reply. See comments below:
On Dec 5, 2009, at 4:22 PM, Laurent Sansonetti wrote: > Hi Steve, > > On Dec 5, 2009, at 1:45 PM, s.ross wrote: > >> My code receives XML data from a Web Service API call that is in UTF8 >> encoding. This winds up in a string. >> >> return_data = NSURLConnection.sendSynchronousRequest(@request, >> returningResponse: response, error: error) >> str = NSString.alloc.initWithData(return_data, encoding: >> NSUTF8StringEncoding) >> puts "******* response encoding it #{str.encoding}" >> >> The result of the puts above is 'MACINTOSH'. >> >> I suspect the encoding of the string is not UTF-8, because when I try to >> parse the XML using REXML, I get: >> >> RegexpError: too short multibyte code >> >> This occurs way in REXML: >> >> /Library/Frameworks/MacRuby.framework/Versions/0.5/usr/lib/ruby/1.9.0/rexml/text.rb:132:in >> `check:' >> >> In any case, my questions are: >> >> 1) If anyone has run across this what did you do? > > I don't believe REXML works. In any case, I would recommend to not use it. > Since you're already using Cocoa, why not giving NSXMLDocument a try? What I really want to use is Nokogiri. My main issue is that I'm having to reimplement XML-RPC because the Ruby Std. Lib version is broken over SSL. Even if it weren't it's never been thread safe and thus can't operate asynchronously. As a result, what I have is an XML document inside an XML-RPC response envelope. That means I have to parse the document once to get the contents of the envelope (which is HTML-escaped), then parse those contents to get an XML document I can work with. I've been using XPath for that, and that's why I haven't moved over the NSXMLDocument. Maybe I'm missing a bet here and should shift my strategy. I'll do some more reading... >> 2) Why might the encoding be MACINTOSH and not UTF-8, as specified in the >> initWithData method call? > > #encoding returns the fastest encoding available for the receiver. You may > specify UTF-8 during the string creation, but if Cocoa can pick a smaller > encoding at runtime (like ASCII) it will. > > This is different from the Ruby 1.9 semantics and we have a plan to fix that > in 0.6. This is kind of surprising behavior. The 1.9 semantics are sufficiently different from 1.8x that code that works correctly on 1.8.7 breaks awkwardly on 1.9. Ok, but I fixed that in an MRI version and the gotcha above broke my MacRuby version. Now that I know this, I guess I can deal with it. > >> 3) Suggestions? > > See my comment in 1) :) > > Laurent > _______________________________________________ > MacRuby-devel mailing list > MacRuby-devel@lists.macosforge.org > http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel _______________________________________________ MacRuby-devel mailing list MacRuby-devel@lists.macosforge.org http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel