[pylucene-dev] Re: JvInitClass calls exit?

Andi Vajda Tue, 13 Dec 2005 11:23:56 -0800


On Tue, 13 Dec 2005, tsuraan wrote:

I was wondering how PyLucene does unicode support.  What I had been
doing was running python's string.decode function to get a unicode
object that I was passing to my c++ backend as a plain array of bytes
(I don't even know if this worked, but it seems reasonable).

PyLucene is based on Java Lucene and as such the java layer only accepts 16bit unicode strings. When you pass a Python 'str' to a PyLuene API, it isassumed to be encoded in utf-8 and is as such converted to unicode for Java.If you pass in a Python unicode string, then, depending on the size of thePython unicode char on your platform, the chars are passed as-is or areconverted (casted) to 16 bit. The 32 to 16 bit casting is likely to be bogusfor unicode chars that have more than 16 significant bits. This is a knownbug.

If I get a unicode string in a document I'm parsing, how do I search on that
string from python?  Do I just give the constructors to my Term
objects and QueryParsers unicode strings, and have them use that?  I
haven't had the courage to dive into the source yet, and I figured
this would probably be an easy question for you to answer :)

The str/unicode to Java Unicode back and forth conversion is handledautomatically for you, and this is done by the p2j() and j2p() functionsdefined in PyLucene.i.


Andi..
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

[pylucene-dev] Re: JvInitClass calls exit?

Reply via email to