On Mon, 30 Aug 2010, technology inspired wrote:
Thanks for the reply. My example runs fine when it runs alone (pure python).
Here is the code:
Ok, then the next step is to port it to a python http server such as [1] so
that you get the threading and initialization story straight:
- initVM() must be called from the main thread, once
- any thread created from Python must call attachCurrentThread() before
making any other calls that involve the JVM
I'm not sure how this is done in the apache2/wsgi environment, that is a
question for another forum. That being said, if you solve this problem,
posting your answer here would be helpful as this has come up before.
About the errors you're reporting, what you're seeing in your browser is
irrelevant. Instead, you must log errors that happen on the Python side and
look for these stacktraces there.
Andi..
[1] http://docs.python.org/library/simplehttpserver.html
#import sys, os
#sys.path.append("/home/v/workspace/example-project/src/trunk")
#os.environ['DJANGO_SETTINGS_MODULE'] = 'example.settings'
from lucene import Field, Document, initVM, NIOFSDirectory, IndexWriter,
StandardAnalyzer, Version, File
from lucene import SimpleFSLockFactory, NumericField, IndexSearcher,
QueryParser, NumericRangeQuery
from lucene import Integer, BooleanQuery, BooleanClause
#from django.shortcuts import render_to_response
def build():
initVM()
dir = NIOFSDirectory(File("/home/v/index"), SimpleFSLockFactory())
analyzer = StandardAnalyzer(Version.LUCENE_30)
writer = IndexWriter(dir, analyzer, True,
IndexWriter.MaxFieldLength(1024))
field_rows = FieldDoc.objects.all() # Currently there is only one row in
database
for row in field_rows:
doc = Document()
if row.category != "":
doc.add(Field('category', row.category, Field.Store.YES,
Field.Index.NOT_ANALYZED))
writer.addDocument(doc)
writer.close()
#return render_to_response("index.html", {"var": "Success"})
But when I connect it with httpd/mod_wsgi, I see the "Success" page some
times and other times, it says "Internal Server Error" with the errors as
mentioned in previous email. I am not aware what is the best practice to run
Python Lucene code from a web server.
You have mentioned about using attachCurrentThread(). I tried using it this
way:
env = initVM()
env.attachCurrentThread()
but no change in the response. I don't know if this is how
attachCurrentThread() should be used in above build function. Please guide
how to connect Lucene code with Apache2/wsgi. My apache2/wsgi is configured
properly as I can run non lucene coded web pages. Apache2 is using
mpm-worker, a threaded environment.
Thanks.
Regards,
Vin
On Sun, Aug 29, 2010 at 12:21 PM, Andi Vajda <[email protected]> wrote:
On Sun, 29 Aug 2010, technology inspired wrote:
I am using PyLucene 3.0.2 on Ubuntu 10.04 with
Python 2.6.5 and Sun Java
1.6. I am written an example script to build index
and store in a directory.
Later on, I want it to search in my next example
script which as of now I
haven't written.
There are two issues I have to mention and looking
for your help:
ISSUE 1:
I am using Apache2 with mod_wsgi 3.3. I have got the
index building script
connected to a GET request. When I call that GET
request, I get following
errors:
[error] [client 127.0.0.1] Premature end of script
headers: wsgi
[notice] child pid exit signal Aborted (6).
With this error, I see "Internal Server Error" on my
browser screen. This
error appears only if I make GET request very often,
i.e. around 1 per 2
seconds. If I issue GET at the interval of 10
seconds, I don't see these
errors.
ISSUE 2:
When I index Date field using NumericField, the GET
request gives "Internal
Server Error" on every alternate request. and the
Apache2 log files gets
these errors:
[error] [client 127.0.0.1] Premature end of script
headers: wsgi
[notice] child pid exit signal Segmentation fault
(11)
I am looking for help to solve these problems. I am
running WSGI deamon
mode. WSGI settings are:
...
WSGIDaemonProcess example.com user=www-data
group-www-data thread 25
WSGIProcessGroup example.com
WSGIScriptAlias /
/home/user1/workspace/http_wsgi/wsgi
...
So do guide how to enable PyLucene based codes
running from Apache2 mod_wsgi
(searching, indexing etc).
First, get your application to work outside of apache2/wsgi, as a
plain Python program. Then, once it's debugged, adapt it to the
apache2/wsgi environment. And, last but not least, if you are using
threads, be sure to call attachCurrentThread() [1] before calling into
the JVM.
Andi..
[1]
http://lucene.apache.org/pylucene/jcc/documentation/readme.html#api