Hi Andi,
thanks for your feedback and for the code cleanup.
Regarding the 'toArray'-issue I tried different versions of JArray
'typed-constructor' and it turned out that these two alternatives basically
work:
(example for int types)
1) return JArray(object)([<lucene.Integer()-object>*])
or
2) return JArray(int)([<Python-int-literal>*])
Even surprising (for me): there are different ways to construct those
template-types using string or type:
>> lucene.JArray(int)([1,2])
and
>> lucene.JArray('int')([1,2])
both create the same type <type 'JArray_int'>
I'd preferred using this option (#2) in toArray (for both JavaList and
JavaSet) as it does not require the wrapping into Java Integer (etc.)
objects. However this method does not work with lucene.ArrayList:
>> x=lucene.JArray('int')([1,2])
JArray<int>[1, 2]
>>> y=lucene.ArrayList(x)
Traceback: lucene.InvalidArgsError:
(<type 'ArrayList'>, '__init__', (JArray<int>[1, 2],))
So I decided to choose the Java-object wrapper option (#1) and implemented
toArray for primitive types (int,float,long,bool). It turned out that
wrapping strings is not needed. That way the collections-demo runs fine and
I can init a lucene.ArrayList with the JavaSet or JavaList for the mentioned
types.
Attached is a revised version of collections.py and collections-demo.py
(which should run without error now).
However there's still one question/issue as you can see from the output of
collections-demo.py (and some commented 'test code' in collections-demo.py):
created JArray: JArray<object>[<Object: 0>, <Object: 1>, <Object: 2>,
<Object: 3>, ...] <type 'JArray_object'>
created ArrayList: [java.lang.Object@785d65, java.lang.Object@785d65,
java.lang.Object@785d65, java.lang.Object@785d65....,] <type 'ArrayList'>
It looks as if the objects passed in from JavaSet to lucene.ArrayList end up
in the same object (that's also why indexOf behaves somewhat strange). Could
be a bug in my test code, but this is no problem for lucene.HashSet(JavaSet)
for example so I'm really curious what's going on here...
If you have any ideas, pls let me know. Will also look into it again if I
got some time but shall be busy for most of the week and out of office next
week.
regards,
Thomas
-----Ursprüngliche Nachricht-----
Von: Andi Vajda [mailto:[email protected]]
Gesendet: Montag, 12. März 2012 03:34
An: [email protected]
Cc: Thomas Koch
Betreff: Re: AW: AW: AW: Setting Stopword Set in PyLucene (or using Set in
general)
Hi Thomas,
On Fri, 2 Mar 2012, Thomas Koch wrote:
> thanks for the feedback! I revised the code and send you attached a
> new patch.
Sorry for the delay in getting back to you.
I integrated your patch and fixed a bunch of formatting and bugs in it.
The collections-demo.py is not fully functional yet so I attach it here too,
somewhat fixed up as well.
There is a bug somewhere with constructing an ArrayList from a python
collection like JavaSet or JavaList. At some point, toArray() gets called,
the right aray is returned (almost, see below) but the ArrayList looks like
built from an array of empty objects.
> I also attach a short demo script that shows the problems I mentioned
> earlier when trying to initialize an ArrayList with a JavaSet (or
> JavaList) containing integers.
For that the toArray() methods in collections.py must create use the correct
array type using int, float, etc... instead of object based on what's in the
python object.
Alternatively, they need these methods need to box the int values by
wrapping them into a Java Integer object (for example, lucene.Integer(5)).
I leave that to you to continue with, I'm out of time for right now :-)
> Finally I'd suggest to rename collections.py because there's one
> defined on Python lib already:
> http://docs.python.org/library/collections.html
Until this happens, you can use:
from lucene import collections
as the collections.py file gets installed in the lucene package.
Throwing Java exceptions from Python is done by raising JavaError with the
desired Java exception object (I added a few to the jcc call in PyLucene's
Makefile), for example:
raise JavaError, NoSuchElementException(str(index))
It's been like that for a very long time, I just forgot.
This is implemented by throwPythonError() in jcc's functions.cpp: if the
error is JavaError, then the Java exception instance used as argument to it
is raised to the JVM.
I attached the not-checked-in diffs as patches. The new Makefile is checked
into the pylucene-3.x branch.
> Below are some comments to your comments...
More responses inline below.
> Ok, I was unsure on how to properly throw a Java Exception in Python
> code - and couldn't find an example.
> Also I thought a Java Exception type should be exported in lucene -
> this is not the case however:
>>>> lucene.NoSuchElementException
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> AttributeError: 'module' object has no attribute 'NoSuchElementException'
>
> I imagine I could
> - add the java.util.NoSuchElementException to the Makefile to get it
> generated by JCC and throw it via raise?
> - use lucene.JavaError and pass 'java.util.NoSuchElementException'
> name in the constructor?
Yes, you guessed it right, this is how it works as outlined above.
You had various bugs in next()/nextIndex(), previous()/previousIndex() that
I hopefully fixed. Also, listIterator() can't be overridden in Python, I
fixed it in PythonList and in collections.py.
Andi..
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from lucene import JArray, Boolean, Float, Integer, Long, String, \
PythonSet, PythonList, PythonIterator, PythonListIterator, JavaError, \
NoSuchElementException, IllegalStateException, IndexOutOfBoundsException
_types = { bool: Boolean,
float: Float,
int: Integer,
long: Long,
# wrapping strings is not needed - actually the 'wrapper'
# lucene.String('xxx') does not exist!
# str: String
}
class JavaSet(PythonSet):
"""
This class implements java.util.Set around a Python set instance it wraps.
"""
def __init__(self, _set):
super(JavaSet, self).__init__()
self._set = _set
def __contains__(self, obj):
return obj in self._set
def __len__(self):
return len(self._set)
def __iter__(self):
return iter(self._set)
def add(self, obj):
if obj not in self._set:
self._set.add(obj)
return True
return False
def addAll(self, collection):
size = len(self._set)
self._set.update(collection)
return len(self._set) > size
def clear(self):
self._set.clear()
def contains(self, obj):
return obj in self._set
def containsAll(self, collection):
for obj in collection:
if obj not in self._set:
return False
return True
def equals(self, collection):
if type(self) is type(collection):
return self._set == collection._set
return False
def isEmpty(self):
return len(self._set) == 0
def iterator(self):
class _iterator(PythonIterator):
def __init__(_self):
super(_iterator, _self).__init__()
_self._iterator = iter(self._set)
def hasNext(_self):
if hasattr(_self, '_next'):
return True
try:
_self._next = _self._iterator.next()
return True
except StopIteration:
return False
def next(_self):
if hasattr(_self, '_next'):
next = _self._next
del _self._next
else:
next = _self._iterator.next()
return next
return _iterator()
def remove(self, obj):
try:
self._set.remove(obj)
return True
except KeyError:
return False
def removeAll(self, collection):
result = False
for obj in collection:
try:
self._set.remove(obj)
result = True
except KeyError:
pass
return result
def retainAll(self, collection):
result = False
for obj in list(self._set):
if obj not in collection:
self._set.remove(obj)
result = True
return result
def size(self):
return len(self._set)
def toArray(self):
"""
convert the set to a lucene.JArray
wrapping all elements into their Java 'counterpart'
if the type is a primitive datatype (int, float, double, bool)
note: we assume all elements have same type!
"""
l = list(self._set)
if l:
t = type(l[0])
if t in _types.keys():
l = map(_types[t], l)
return JArray(object)(l)
class JavaListIterator(PythonListIterator):
"""
This class implements java.util.ListIterator for a Python list instance it
wraps. (simple bidirectional iterator)
"""
def __init__(self, _lst, index=0):
super(JavaListIterator, self).__init__()
self._lst = _lst
self._lastIndex = -1 # keep state for remove/set
self.index = index
def next(self):
if self.index >= len(self._lst):
raise JavaError, NoSuchElementException(str(self.index))
result = self._lst[self.index]
self._lastIndex = self.index
self.index += 1
return result
def previous(self):
if self.index <= 0:
raise JavaError, NoSuchElementException(str(self.index - 1))
self.index -= 1
self._lastIndex = self.index
return self._lst[self.index]
def hasPrevious(self):
return self.index > 0
def hasNext(self):
return self.index < len(self._lst)
def nextIndex(self):
return min(self.index, len(self._lst))
def previousIndex(self):
return max(-1, self.index - 1)
def add(self, element):
"""
Inserts the specified element into the list.
The element is inserted immediately before the next element
that would be returned by next, if any, and after the next
element that would be returned by previous, if any.
"""
if self._lastIndex < 0:
raise JavaError, IllegalStateException("add")
self._lst.insert(self.index, element)
self.index += 1
self._lastIndex = -1 # invalidate state
def remove(self):
"""
Removes from the list the last element that
was returned by next or previous.
"""
if self._lastIndex < 0:
raise JavaError, IllegalStateException("remove")
del self._lst[self._lastIndex]
self._lastIndex = -1 # invalidate state
def set(self, element):
"""
Replaces the last element returned by next or previous
with the specified element.
"""
if self._lastIndex < 0:
raise JavaError, IllegalStateException("set")
self._lst[self._lastIndex] = element
def __iter__(self):
return self
class JavaList(PythonList):
"""
This class implements java.util.List around a Python list instance it wraps.
"""
def __init__(self, _lst):
super(JavaList, self).__init__()
self._lst = _lst
def __contains__(self, obj):
return obj in self._lst
def __len__(self):
return len(self._lst)
def __iter__(self):
return iter(self._lst)
def add(self, index, obj):
self._lst.insert(index, obj)
def addAll(self, collection):
size = len(self._lst)
self._lst.extend(collection)
return len(self._lst) > size
def addAll(self, index, collection):
size = len(self._lst)
self._lst[index:index] = collection
return len(self._lst) > size
def clear(self):
del self._lst[:]
def contains(self, obj):
return obj in self._lst
def containsAll(self, collection):
for obj in collection:
if obj not in self._lst:
return False
return True
def equals(self, collection):
if type(self) is type(collection):
return self._lst == collection._lst
return False
def get(self, index):
if index < 0 or index >= self.size():
raise JavaError, IndexOutOfBoundsException(str(index))
return self._lst[index]
def indexOf(self, obj):
try:
return self._lst.index(obj)
except ValueError:
return -1
def isEmpty(self):
return len(self._lst) == 0
def iterator(self):
class _iterator(PythonIterator):
def __init__(_self):
super(_iterator, _self).__init__()
_self._iterator = iter(self._lst)
def hasNext(_self):
if hasattr(_self, '_next'):
return True
try:
_self._next = _self._iterator.next()
return True
except StopIteration:
return False
def next(_self):
if hasattr(_self, '_next'):
next = _self._next
del _self._next
else:
next = _self._iterator.next()
return next
return _iterator()
def lastIndexOf(self, obj):
i = len(self._lst)-1
while (i>=0):
if obj.equals(self._lst[i]):
break
i -= 1
return i
def listIterator(self, index=0):
return JavaListIterator(self._lst, index)
def remove(self, obj_or_index):
if type(obj_or_index) is type(1):
return removeAt(int(obj_or_index))
return removeElement(obj_or_index)
def removeAt(self, pos):
"""
Removes the element at the specified position in this list.
Note: private method called from Java via remove(int index)
index is already checked (or IndexOutOfBoundsException thrown)
"""
try:
el = self._lst[pos]
del self._lst[pos]
return el
except IndexError:
# should not happen
return None
def removeObject(self, obj):
"""
Removes the first occurrence of the specified object
from this list, if it is present
"""
try:
self._lst.remove(obj)
return True
except ValueError:
return False
def removeAll(self, collection):
result = False
for obj in collection:
if self.removeElement(obj):
result = True
return result
def retainAll(self, collection):
result = False
for obj in self._lst:
if obj not in collection and self.removeElement(obj):
result = True
return result
def size(self):
return len(self._lst)
def toArray(self): # JavaList
klass = object
l = self._lst
if l:
t = type(l[0])
if t in _types.keys():
klass = _types[t]
print "Map type ",t, " -> ", klass
def toArray(self):
"""
convert the list to a lucene.JArray
wrapping all elements into their Java 'counterpart'
if the type is a primitive datatype (int, float, double, bool)
note: we assume all elements have same type!
"""
l = self._lst
if l:
t = type(l[0])
if t in _types.keys():
l = map(_types[t], l)
return JArray(object)(l)
def subListChecked(self, fromIndex, toIndex):
"""
Note: private method called from Java via subList()
from/to index are already checked (or IndexOutOfBoundsException thrown)
also IllegalArgumentException is thronw if the endpoint indices
are out of order (fromIndex > toIndex)
"""
sublst = self._lst[fromIndex:toIndex]
return JavaList(sublst)
def set(self, index, obj):
if index < 0 or index >= self.size():
raise JavaError, IndexOutOfBoundsException(str(index))
self._lst[index] = obj
import sys, os
import lucene
print 'Python ', sys.version
print 'lucene ', lucene.VERSION
from lucene.collections import JavaSet, JavaList
class Foo(object):
def __init__(self, name):
self.name = name
def __str__(self):
return "<Foo:name=%s>" % self.name
def _testSet(s, typ=''):
print "\nSet of %s: %s" % (typ, s)
# create python wrapper for Java Set
# NOTE: this Python class extends/implements the Java class lucene.PythonSet
elem = list(s)[0]
ps = JavaSet(s)
print "created:", ps, type(ps)
size = ps.size()
print "size: " , size
assert(size == len(s)), "size"
has = ps.contains(elem)
print "contains('%s'): %s" % (elem, has)
assert(has is True), "contains"
# add existing element
assert(ps.add(elem) is False), "do not add existing element"
assert(size == ps.size()), "size has not changed"
# iterate
#iter = ps.iterator()
#print "iterator:", iter, type(iter)
#while (iter.hasNext()):
# print "next:", iter.next()
# create HashSet in JVM
js = lucene.HashSet(ps)
print "created HashSet:", js, type(js)
assert(size == js.size()), "Hashtable has same size"
assert(js.contains(elem) is True), "contains"
# create JArray in JVM from HashSet
jar = js.toArray()
print "created JArray:", jar, type(jar)
assert(size == len(jar)), "size"
# create JArray in JVM from JavaSet
#ar = ps.toArray()
#print "toArray:", ar
# create ArrayList in JVM
jl = lucene.ArrayList(ps)
print "created ArrayList:", jl, type(jl)
assert(size == jl.size()), "ArrayList has same size"
#sl = jl.subList(1, 3)
#print "sublist:", sl
def testObjectSet():
_testSet(set([Foo('a'), Foo('b'), Foo('c')]),'Object')
def testStringSet():
_testSet(set(['a', 'b', 'c']),'String')
def testFloatSet():
_testSet(set([1.3, 4.5,-0.56]),'Float')
def testBoolSet():
_testSet(set([True, False]),'Bool')
def testIntSet():
_testSet(set(range(10)),'Int')
def _testList(l, typ=''):
print "\nList of %s: %s" % (typ, l)
# create python wrapper for Java List
# NOTE: this Python class extends/implements the Java class
lucene.PythonList
pl = JavaList(l)
elem0 = l[0]
elem1 = l[1]
print "created:", pl, type(pl)
size = pl.size()
print "size:", size
assert(size == len(l)), "size"
pos0 = pl.indexOf(elem0)
print "indexOf first element: %s is %d" % (elem0, pos0)
assert(pos0 == 0), "indexOf first element"
pos1 = pl.indexOf(elem1)
print "indexOf second element: %s is %d" % (elem1, pos1)
assert(pos1 == 1), "indexOf second element"
# iterate
#iter = pl.iterator()
#print "iterator:", iter, type(iter)
#while (iter.hasNext()):
# print "next:", iter.next()
# create HashSet in JVM
js = lucene.HashSet(pl)
print "created HashSet:", js, type(js)
assert(size == js.size()), "size"
assert(js.contains(elem0) is True), "contains"
# create JArray in JVM from HashSet
jar = js.toArray()
print "created JArray:", jar, type(jar)
assert(size == len(jar)), "size"
# create JArray in JVM from JavaList
#ar = pl.toArray()
#print "toArray:", ar
# create ArrayList in JVM from Array
jl = lucene.ArrayList(pl)
print "created ArrayList:", jl, type(jl)
assert(size == jl.size()), "size"
## elem1 = jl.get(1)
## print "indexOf second element: %s (%s) is %d" % (elem1,type(elem1),
jl.indexOf(elem1))
## NOTE: this currently fails!
## assert(1 == jl.indexOf(elem1)), "indexOf"
## assert(2 == jl.indexOf(jl.get(2))), "indexOf(2)"
## NOTE: this shows same object references in the ArrayList!
## for i in range(size):
## elem = elem1 = jl.get(i)
## print "element #%d: %s (%s) indexOf:%d" % (i,elem, type(elem),
jl.indexOf(elem))
#sl = jl.subList(1, 3)
#print "sublist:", sl
def testObjectList():
_testSet([Foo('a'), Foo('b'), Foo('c')],'Object')
def testStringList():
_testList(['a', 'b', 'c'],'String')
def testFloatList():
_testList([1.3, 4.5,-0.56],'Float')
def testBoolList():
_testList([True, False],'Bool')
def testIntList():
_testList(range(10),'Int')
if __name__ == '__main__':
lucene.initVM()
# passing python objects into java is not supported
# (unless a java 'base' class is defined and extended,
# cf. package org.apache.pylucene.util for example)
# testObjectSet()
# testObjectList()
# JavaSet:
testStringSet()
testFloatSet()
testBoolList()
testIntSet()
# JavaList:
testStringList()
testFloatList()
testBoolList()
testIntList()