mwlib: reworking re2c files to use ctypes

Martin Langhoff Fri, 28 Jan 2011 04:56:29 -0800

Hi Ralf, Volker,

writing to you as you seem to be active maintainers of mwlib and the
re2c files.


OLPC ships an early version of mwlib in its WikiBrowse (aka
Wikiserver) activity, and it's a tool of major important. (Thanks for
your code! Having a nice wikislice on the many XOs that have little or
no connectivity makes a huge impact out there.)

The compiled .so files are a bit of a problem currently for us. We
ship "activities" (user-installable program bundles) that are usually
pure python, and (if prepared carefully) can be installed in several
releases of our OSs, which in turn are based on various Fedora
releases.

Binaries are not recommended inside of those bundles, but if they link
to generic libs with stable API/ABI, things are generally ok.

The re2c binaries from mwlib, unfortunately, inteface with Python
using swig, which means that they end up linking directly to
libpython. We use Python extensively, so we update somewhat
aggressively to the latest version in Fedora. So what happens is that
those SO files end up being tied to specific versions.

There is a different, better way to do this -- to create standalone
.so files, and to use them from Python using ctypes. That way, we can
distribute precompiled .so files that are significantly more portable
(they are still arch and glibc ABI specific).

Would that be of interest to you? Has anyone thought about this, or
worked on this?

If yes, I have done some initial hacking on this you might be
interested in. I have attached a WIP patch against an earlier version
of your _expander.re, it drops a lot of the glue, like:

 mwlib/Makefile     |    4 +-
 mwlib/_expander.re |   75 ++++++++++-----------------------------------------
 2 files changed, 17 insertions(+), 62 deletions(-)

It is not finished, definitely work-in-progress. Once it works, you can just use

 import ctypes
 ctypes.cdll.LoadLibrary('_expander.so')
 _expander = ctypes.CDLL('_expander.so')
 _expander.scan('foo')

And same for _uscan.re .

I now see that in your latest code you are actually not using
_expander.re anymore. How does the Python-based tokenizer perform,
compared to the re2c tokenizer? We care a lot about keeping things
fast.

cheers,



m
-- 
 [email protected] -- Software Architect - OLPC
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff

diff --git a/mwlib/Makefile b/mwlib/Makefile
index 6f244ef..ff4c424 100644
--- a/mwlib/Makefile
+++ b/mwlib/Makefile
@@ -2,8 +2,8 @@ RE2C = re2c -w --no-generation-date
 
 all: _expander.cc _mwscan.cc _mwscan.so _expander.so
 
-_expander.so: _expander.cc
-	(cd .. && python ./setup.py build_ext --inplace build)
+expander.so: _expander.cc
+	g++ -shared _expander.o -o expander.so
 
 _mwscan.so: _mwscan.cc
 	(cd .. && python ./setup.py build_ext --inplace build)
diff --git a/mwlib/_expander.re b/mwlib/_expander.re
index 7abb2ac..64865a0 100644
--- a/mwlib/_expander.re
+++ b/mwlib/_expander.re
@@ -2,12 +2,9 @@
 // Copyright (c) 2007-2008 PediaPress GmbH
 // See README.txt for additional licensing information.
 
-#include <Python.h>
-
 #include <iostream>
 #include <assert.h>
 #include <vector>
-
 using namespace std;
 
 #define RET(x) {found(x); return x;}
@@ -24,7 +21,7 @@ class MacroScanner
 {
 public:
 
-	MacroScanner(Py_UNICODE *_start, Py_UNICODE *_end) {
+	MacroScanner(wchar_t *_start,  wchar_t *_end) {
 		source = start = _start;
 		end = _end;
 		cursor = start;
@@ -48,11 +45,11 @@ public:
 
 	inline int scan();
 
-	Py_UNICODE *source;
+	wchar_t *source;
 
-	Py_UNICODE *start;
-	Py_UNICODE *cursor;
-	Py_UNICODE *end;
+	wchar_t *start;
+	wchar_t *cursor;
+	wchar_t *end;
 	vector<Token> tokens;
 };
 
@@ -64,12 +61,10 @@ std:
 
 	start=cursor;
 	
-	Py_UNICODE *marker=cursor;
-
-	Py_UNICODE *save_cursor = cursor;
+	wchar_t *marker=cursor;
 
 
-#define YYCTYPE         Py_UNICODE
+#define YYCTYPE         wchar_t
 #define YYCURSOR        cursor
 #define YYMARKER	marker
 #define YYLIMIT   (end)
@@ -147,60 +142,20 @@ pre:
 }
 
 
-PyObject *py_scan(PyObject *self, PyObject *args) 
+std::vector<Token> scan(wchar_t *unistr) 
 {
-	PyObject *arg1;
-	if (!PyArg_ParseTuple(args, "O:_expander.scan", &arg1)) {
-		return 0;
-	}
-	PyUnicodeObject *unistr = (PyUnicodeObject*)PyUnicode_FromObject(arg1);
-	if (unistr == NULL) {
-		PyErr_SetString(PyExc_TypeError,
-				"parameter cannot be converted to unicode in _expander.scan");
-		return 0;
-	}
 
-	Py_UNICODE *start = unistr->str;
-	Py_UNICODE *end = start+unistr->length;
+	wchar_t *start = 0;
+	wchar_t *end   = start+wcslen(unistr);
 	
 
 	MacroScanner scanner (start, end);
-	Py_BEGIN_ALLOW_THREADS
 	while (scanner.scan()) {
 	}
-	Py_END_ALLOW_THREADS
-	Py_XDECREF(unistr);
-	
-	// return PyList_New(0); // uncomment to see timings for scanning
 
-	int size = scanner.tokens.size();
-	PyObject *result = PyList_New(size);
-	if (!result) {
-		return 0;
-	}
-	
-	for (int i=0; i<size; i++) {
-		Token t = scanner.tokens[i];
-		PyList_SET_ITEM(result, i, Py_BuildValue("iii", t.type, t.start, t.len));
-	}
-	
-	return result;
-}
-
-
-
-static PyMethodDef module_functions[] = {
-	{"scan", (PyCFunction)py_scan, METH_VARARGS, "scan(text)"},
-	{0, 0},
+	/* int size = scanner.tokens.size();
+	if (size==0) {
+	  return 0; maybe should return NULL? instead of int?
+	  } */ 
+	return scanner.tokens;
 };
-
-
-
-extern "C" {
-	DL_EXPORT(void) init_expander();
-}
-
-DL_EXPORT(void) init_expander()
-{
-	/*PyObject *m =*/ Py_InitModule("_expander", module_functions);
-}

_______________________________________________
Devel mailing list
[email protected]
http://lists.laptop.org/listinfo/devel

mwlib: reworking re2c files to use ctypes

Reply via email to