Hi Andi,
thanks for your feedback and for the code cleanup.

Regarding the 'toArray'-issue I tried different versions of JArray
'typed-constructor' and it turned out that these two alternatives basically
work:
(example for int types)

1) return JArray(object)([<lucene.Integer()-object>*])
or
2) return JArray(int)([<Python-int-literal>*])

Even surprising (for me): there are different ways to construct those
template-types using string or type:

 >> lucene.JArray(int)([1,2])
 and
 >> lucene.JArray('int')([1,2]) 
 both create the same type  <type 'JArray_int'>

I'd preferred using this option (#2) in toArray (for both JavaList and
JavaSet) as it does not require the wrapping into  Java Integer (etc.)
objects. However this method does not work with lucene.ArrayList:

 >> x=lucene.JArray('int')([1,2])
 JArray<int>[1, 2]
 >>> y=lucene.ArrayList(x)
 Traceback: lucene.InvalidArgsError:
  (<type 'ArrayList'>, '__init__', (JArray<int>[1, 2],))

So I decided to choose the Java-object wrapper option (#1) and implemented
toArray for primitive types (int,float,long,bool). It turned out that
wrapping strings is not needed.  That way the collections-demo runs fine and
I can init a lucene.ArrayList with the JavaSet or JavaList for the mentioned
types.

Attached is a revised version of collections.py and collections-demo.py
(which should run without error now).

However there's still one question/issue as you can see from the output of
collections-demo.py (and some commented 'test code' in collections-demo.py):

created JArray: JArray<object>[<Object: 0>, <Object: 1>, <Object: 2>,
<Object: 3>, ...] <type 'JArray_object'>
created ArrayList: [java.lang.Object@785d65, java.lang.Object@785d65,
java.lang.Object@785d65, java.lang.Object@785d65....,] <type 'ArrayList'>

It looks as if the objects passed in from JavaSet to lucene.ArrayList end up
in the same object (that's also why indexOf behaves somewhat strange). Could
be a bug in my test code, but this is no problem for lucene.HashSet(JavaSet)
for example so I'm really curious what's going on here...

If you have any ideas, pls let me know. Will also look into it again if I
got some time but shall be busy for most of the week and out of office next
week.

regards,
Thomas

-----Ursprüngliche Nachricht-----
Von: Andi Vajda [mailto:[email protected]] 
Gesendet: Montag, 12. März 2012 03:34
An: [email protected]
Cc: Thomas Koch
Betreff: Re: AW: AW: AW: Setting Stopword Set in PyLucene (or using Set in
general)


  Hi Thomas,

On Fri, 2 Mar 2012, Thomas Koch wrote:

> thanks for the feedback! I revised the code and send you attached a 
> new patch.

Sorry for the delay in getting back to you.

I integrated your patch and fixed a bunch of formatting and bugs in it.
The collections-demo.py is not fully functional yet so I attach it here too,
somewhat fixed up as well.

There is a bug somewhere with constructing an ArrayList from a python
collection like JavaSet or JavaList. At some point, toArray() gets called,
the right aray is returned (almost, see below) but the ArrayList looks like
built from an array of empty objects.

> I also attach a short demo script that shows the problems I mentioned 
> earlier when trying to initialize an ArrayList with a JavaSet (or 
> JavaList) containing integers.

For that the toArray() methods in collections.py must create use the correct
array type using int, float, etc... instead of object based on what's in the
python object.
Alternatively, they need these methods need to box the int values by
wrapping them into a Java Integer object (for example, lucene.Integer(5)).
I leave that to you to continue with, I'm out of time for right now :-)

> Finally I'd suggest to rename collections.py because there's one 
> defined on Python lib already:
> http://docs.python.org/library/collections.html

Until this happens, you can use:
  from lucene import collections
as the collections.py file gets installed in the lucene package.

Throwing Java exceptions from Python is done by raising JavaError with the
desired Java exception object (I added a few to the jcc call in PyLucene's
Makefile), for example:
   raise JavaError, NoSuchElementException(str(index))

It's been like that for a very long time, I just forgot.
This is implemented by throwPythonError() in jcc's functions.cpp: if the
error is JavaError, then the Java exception instance used as argument to it
is raised to the JVM.

I attached the not-checked-in diffs as patches. The new Makefile is checked
into the pylucene-3.x branch.

> Below are some comments to your comments...

More responses inline below.

> Ok, I was unsure on how to properly throw a Java Exception in Python 
> code - and couldn't find an example.
> Also I thought a Java Exception type should be exported in lucene - 
> this is not the case however:
>>>> lucene.NoSuchElementException
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
> AttributeError: 'module' object has no attribute 'NoSuchElementException'
>
> I imagine I could
> - add the java.util.NoSuchElementException to the Makefile to get it 
> generated by JCC and throw it via raise?
> - use lucene.JavaError and pass  'java.util.NoSuchElementException' 
> name in the constructor?

Yes, you guessed it right, this is how it works as outlined above.

You had various bugs in next()/nextIndex(), previous()/previousIndex() that
I hopefully fixed. Also, listIterator() can't be overridden in Python, I
fixed it in PythonList and in collections.py.

Andi..
#   Licensed under the Apache License, Version 2.0 (the "License");
#   you may not use this file except in compliance with the License.
#   You may obtain a copy of the License at
#
#       http://www.apache.org/licenses/LICENSE-2.0
#
#   Unless required by applicable law or agreed to in writing, software
#   distributed under the License is distributed on an "AS IS" BASIS,
#   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#   See the License for the specific language governing permissions and
#   limitations under the License.

from lucene import JArray, Boolean, Float, Integer, Long, String, \
    PythonSet, PythonList, PythonIterator, PythonListIterator, JavaError, \
    NoSuchElementException, IllegalStateException, IndexOutOfBoundsException


_types = { bool: Boolean,
          float: Float,
            int: Integer,
           long: Long,
# wrapping strings is not needed - actually the 'wrapper'
# lucene.String('xxx') does not exist!
#            str: String
         }

class JavaSet(PythonSet):
    """
    This class implements java.util.Set around a Python set instance it wraps.
    """

    def __init__(self, _set):
        super(JavaSet, self).__init__()
        self._set = _set

    def __contains__(self, obj):
        return obj in self._set

    def __len__(self):
        return len(self._set)

    def __iter__(self):
        return iter(self._set)

    def add(self, obj):
        if obj not in self._set:
            self._set.add(obj)
            return True
        return False

    def addAll(self, collection):
        size = len(self._set)
        self._set.update(collection)
        return len(self._set) > size

    def clear(self):
        self._set.clear()

    def contains(self, obj):
        return obj in self._set

    def containsAll(self, collection):
        for obj in collection:
            if obj not in self._set:
                return False
        return True

    def equals(self, collection):
        if type(self) is type(collection):
            return self._set == collection._set
        return False

    def isEmpty(self):
        return len(self._set) == 0

    def iterator(self):
        class _iterator(PythonIterator):
            def __init__(_self):
                super(_iterator, _self).__init__()
                _self._iterator = iter(self._set)
            def hasNext(_self):
                if hasattr(_self, '_next'):
                    return True
                try:
                    _self._next = _self._iterator.next()
                    return True
                except StopIteration:
                    return False
            def next(_self):
                if hasattr(_self, '_next'):
                    next = _self._next
                    del _self._next
                else:
                    next = _self._iterator.next()
                return next
        return _iterator()

    def remove(self, obj):
        try:
            self._set.remove(obj)
            return True
        except KeyError:
            return False

    def removeAll(self, collection):
        result = False
        for obj in collection:
            try:
                self._set.remove(obj)
                result = True
            except KeyError:
                pass
        return result

    def retainAll(self, collection):
        result = False
        for obj in list(self._set):
            if obj not in collection:
                self._set.remove(obj)
                result = True
        return result

    def size(self):
        return len(self._set)

    def toArray(self):
        """
        convert the set to a lucene.JArray
        wrapping all elements into their Java 'counterpart'
        if the type is a primitive datatype (int, float, double, bool)
        note: we assume all elements have same type!
        """
        l = list(self._set)
        if l:
            t = type(l[0])
            if t in _types.keys():
                l = map(_types[t], l)
        return JArray(object)(l)



class JavaListIterator(PythonListIterator):
    """
    This class implements java.util.ListIterator for a Python list instance it
    wraps. (simple bidirectional iterator)
    """
    def __init__(self, _lst, index=0):
        super(JavaListIterator, self).__init__()
        self._lst = _lst
        self._lastIndex = -1 # keep state for remove/set
        self.index = index

    def next(self):
        if self.index >= len(self._lst):
            raise JavaError, NoSuchElementException(str(self.index))
        result = self._lst[self.index]
        self._lastIndex = self.index
        self.index += 1
        return result

    def previous(self):
        if self.index <= 0:
            raise JavaError, NoSuchElementException(str(self.index - 1))
        self.index -= 1
        self._lastIndex = self.index
        return self._lst[self.index]

    def hasPrevious(self):
        return self.index > 0

    def hasNext(self):
        return self.index < len(self._lst)

    def nextIndex(self):
        return min(self.index, len(self._lst))

    def previousIndex(self):
        return max(-1, self.index - 1)

    def add(self, element):
        """
        Inserts the specified element into the list.
        The element is inserted immediately before the next element
        that would be returned by next, if any, and after the next
        element that would be returned by previous, if any.
        """
        if self._lastIndex < 0:
            raise JavaError, IllegalStateException("add")
        self._lst.insert(self.index, element)
        self.index += 1
        self._lastIndex = -1 # invalidate state

    def remove(self):
        """
        Removes from the list the last element that
        was returned by next or previous.
        """
        if self._lastIndex < 0:
            raise JavaError, IllegalStateException("remove")
        del self._lst[self._lastIndex]
        self._lastIndex = -1 # invalidate state

    def set(self, element):
        """
        Replaces the last element returned by next or previous
        with the specified element.
        """
        if self._lastIndex < 0:
            raise JavaError, IllegalStateException("set")
        self._lst[self._lastIndex] = element

    def __iter__(self):
        return self


class JavaList(PythonList):
    """
    This class implements java.util.List around a Python list instance it wraps.
    """

    def __init__(self, _lst):
        super(JavaList, self).__init__()
        self._lst = _lst

    def __contains__(self, obj):
        return obj in self._lst

    def __len__(self):
        return len(self._lst)

    def __iter__(self):
        return iter(self._lst)

    def add(self, index, obj):
        self._lst.insert(index, obj)

    def addAll(self, collection):
        size = len(self._lst)
        self._lst.extend(collection)
        return len(self._lst) > size

    def addAll(self, index, collection):
        size = len(self._lst)
        self._lst[index:index] = collection
        return len(self._lst) > size

    def clear(self):
        del self._lst[:]

    def contains(self, obj):
        return obj in self._lst

    def containsAll(self, collection):
        for obj in collection:
            if obj not in self._lst:
                return False
        return True

    def equals(self, collection):
        if type(self) is type(collection):
            return self._lst == collection._lst
        return False

    def get(self, index):
        if index < 0 or index >= self.size():
            raise JavaError, IndexOutOfBoundsException(str(index))
        return self._lst[index]

    def indexOf(self, obj):
        try:
            return self._lst.index(obj)
        except ValueError:
            return -1

    def isEmpty(self):
        return len(self._lst) == 0

    def iterator(self):
        class _iterator(PythonIterator):
            def __init__(_self):
                super(_iterator, _self).__init__()
                _self._iterator = iter(self._lst)
            def hasNext(_self):
                if hasattr(_self, '_next'):
                    return True
                try:
                    _self._next = _self._iterator.next()
                    return True
                except StopIteration:
                    return False
            def next(_self):
                if hasattr(_self, '_next'):
                    next = _self._next
                    del _self._next
                else:
                    next = _self._iterator.next()
                return next
        return _iterator()

    def lastIndexOf(self, obj):
        i = len(self._lst)-1
        while (i>=0):
            if obj.equals(self._lst[i]):
                break
            i -= 1
        return i

    def listIterator(self, index=0):
        return JavaListIterator(self._lst, index)

    def remove(self, obj_or_index):
        if type(obj_or_index) is type(1):
            return removeAt(int(obj_or_index))
        return removeElement(obj_or_index)

    def removeAt(self, pos):
        """
        Removes the element at the specified position in this list.
        Note: private method called from Java via remove(int index)
        index is already checked (or IndexOutOfBoundsException thrown)
        """
        try:
            el = self._lst[pos]
            del self._lst[pos]
            return el
        except IndexError:
            # should not happen
            return None

    def removeObject(self, obj):
        """
        Removes the first occurrence of the specified object
        from this list, if it is present
        """
        try:
            self._lst.remove(obj)
            return True
        except ValueError:
            return False

    def removeAll(self, collection):
        result = False
        for obj in collection:
            if self.removeElement(obj):
                result = True
        return result

    def retainAll(self, collection):
        result = False
        for obj in self._lst:
            if obj not in collection and self.removeElement(obj):
                result = True
        return result

    def size(self):
        return len(self._lst)

    def toArray(self):  # JavaList
        klass = object
        l = self._lst
        if l:
            t = type(l[0])
            if t in _types.keys():
                klass = _types[t]
                print "Map type ",t, " -> ", klass

    def toArray(self):
        """
        convert the list to a lucene.JArray
        wrapping all elements into their Java 'counterpart'
        if the type is a primitive datatype (int, float, double, bool)
        note: we assume all elements have same type!
        """
        l = self._lst
        if l:
            t = type(l[0])
            if t in _types.keys():
                l = map(_types[t], l)
        return JArray(object)(l)

    def subListChecked(self, fromIndex, toIndex):
        """
        Note: private method called from Java via subList()
        from/to index are already checked (or IndexOutOfBoundsException thrown)
        also IllegalArgumentException is thronw if the endpoint indices
        are out of order (fromIndex > toIndex)
        """
        sublst = self._lst[fromIndex:toIndex]
        return JavaList(sublst)

    def set(self, index, obj):
        if index < 0 or index >= self.size():
            raise JavaError, IndexOutOfBoundsException(str(index))
        self._lst[index] = obj
import sys, os
import lucene

print 'Python ', sys.version
print 'lucene ', lucene.VERSION

from lucene.collections import JavaSet, JavaList


class Foo(object):

    def __init__(self, name):
        self.name = name

    def __str__(self):
        return "<Foo:name=%s>" % self.name


def _testSet(s, typ=''):
    print "\nSet of  %s: %s" % (typ, s)
    # create python wrapper for Java Set
    # NOTE: this Python class extends/implements the Java class lucene.PythonSet
    elem = list(s)[0]
    ps = JavaSet(s)
    print "created:", ps, type(ps)
    size = ps.size()
    print "size: " , size
    assert(size == len(s)), "size"
    has = ps.contains(elem)
    print "contains('%s'): %s" % (elem, has)
    assert(has is True), "contains"
    # add existing element
    assert(ps.add(elem) is False), "do not add existing element"
    assert(size == ps.size()), "size has not changed"

    # iterate
    #iter = ps.iterator()
    #print "iterator:", iter, type(iter)
    #while (iter.hasNext()):
    #    print "next:", iter.next()

    # create HashSet in JVM
    js = lucene.HashSet(ps)
    print "created HashSet:", js, type(js)
    assert(size == js.size()), "Hashtable has same size"
    assert(js.contains(elem) is True), "contains"

    # create JArray in JVM from HashSet
    jar = js.toArray()
    print "created JArray:", jar, type(jar)
    assert(size == len(jar)), "size"

    # create JArray in JVM from JavaSet
    #ar = ps.toArray()
    #print "toArray:", ar

    # create ArrayList in JVM
    jl = lucene.ArrayList(ps)
    print "created ArrayList:", jl, type(jl)
    assert(size == jl.size()), "ArrayList has same size"
    #sl = jl.subList(1, 3)
    #print "sublist:", sl


def testObjectSet():
    _testSet(set([Foo('a'), Foo('b'), Foo('c')]),'Object')

def testStringSet():
    _testSet(set(['a', 'b', 'c']),'String')

def testFloatSet():
    _testSet(set([1.3, 4.5,-0.56]),'Float')

def testBoolSet():
    _testSet(set([True, False]),'Bool')

def testIntSet():
    _testSet(set(range(10)),'Int')

def _testList(l, typ=''):
    print "\nList of  %s: %s" % (typ, l)
    # create python wrapper for Java List
    # NOTE: this Python class extends/implements the Java class 
lucene.PythonList
    pl = JavaList(l)
    elem0 = l[0]
    elem1 = l[1]
    print "created:", pl, type(pl)
    size = pl.size()
    print "size:", size
    assert(size == len(l)), "size"
    pos0 = pl.indexOf(elem0)
    print "indexOf first element: %s is %d" % (elem0, pos0)
    assert(pos0 == 0), "indexOf first element"
    pos1 = pl.indexOf(elem1)
    print "indexOf second element: %s is %d" % (elem1, pos1)
    assert(pos1 == 1), "indexOf second element"

    # iterate
    #iter = pl.iterator()
    #print "iterator:", iter, type(iter)
    #while (iter.hasNext()):
    #    print "next:", iter.next()

    # create HashSet in JVM
    js = lucene.HashSet(pl)
    print "created HashSet:", js, type(js)
    assert(size == js.size()), "size"
    assert(js.contains(elem0) is True), "contains"

    # create JArray in JVM from HashSet
    jar = js.toArray()
    print "created JArray:", jar, type(jar)
    assert(size == len(jar)), "size"

    # create JArray in JVM from JavaList
    #ar = pl.toArray()
    #print "toArray:", ar

    # create ArrayList in JVM from Array
    jl = lucene.ArrayList(pl)
    print "created ArrayList:", jl, type(jl)
    assert(size == jl.size()), "size"

    ## elem1 = jl.get(1)
    ## print "indexOf second element: %s (%s) is %d" % (elem1,type(elem1), 
jl.indexOf(elem1))
    ## NOTE: this currently fails!
    ## assert(1 == jl.indexOf(elem1)), "indexOf"
    ## assert(2 == jl.indexOf(jl.get(2))), "indexOf(2)"
    ## NOTE: this shows same object references in the ArrayList!
    ## for i in range(size):
    ##        elem = elem1 = jl.get(i)
    ##        print "element #%d: %s (%s) indexOf:%d" % (i,elem, type(elem), 
jl.indexOf(elem))

    #sl = jl.subList(1, 3)
    #print "sublist:", sl


def testObjectList():
    _testSet([Foo('a'), Foo('b'), Foo('c')],'Object')

def testStringList():
    _testList(['a', 'b', 'c'],'String')

def testFloatList():
    _testList([1.3, 4.5,-0.56],'Float')

def testBoolList():
    _testList([True, False],'Bool')

def testIntList():
    _testList(range(10),'Int')


if __name__ == '__main__':
    lucene.initVM()

    # passing python objects into java is not supported
    # (unless a java 'base' class is defined and extended,
    #  cf. package org.apache.pylucene.util for example)
    # testObjectSet()
    # testObjectList()

    # JavaSet:
    testStringSet()
    testFloatSet()
    testBoolList()
    testIntSet()
    # JavaList:
    testStringList()
    testFloatList()
    testBoolList()
    testIntList()

Reply via email to