On Tue, Sep 10, 2013 at 10:31:02PM +0200, Julian Andres Klode wrote: > On Tue, Sep 10, 2013 at 05:24:14PM +0200, Michael Vogt wrote: > > On Mon, Sep 09, 2013 at 02:14:56PM +0200, Jakub Wilk wrote: > > > * Julian Andres Klode <j...@debian.org>, 2013-08-26, 13:48: > > > >>The apt_inst.TarFile.extractdata method requires that it's > > > >>argument is a str. It would be nice if it also accepted bytes > > > >>objects. This would be consistent with e.g. built-in open > > > >>function, which accepts one or the other as filename. > > > > > > > >Thank you for your bug report; and yes, you're right. We are using > > > >PyArg_ParseTuple() to parse the string. We currently use the > > > >format string "s" for parsing file names. If you happen to know > > > >the best format string we could use that accepts bytes strings as > > > >well, please let us know. It has to work with Python 2.7 as well. > > > > > > The documentation for the "s" format reads: "This format does not > > > accept bytes-like objects. If you want to accept filesystem paths > > > and convert them to C character strings, it is preferable to use the > > > O& format with PyUnicode_FSConverter() as converter." > > > > > > PyUnicode_FSConverter is not available in Python 2.X, though. > > > > The attached patch should implement this for Python 3.1+, python2.X is > > unchanged (for now). Feedback/review welcome! > > The problem is that there are various other places where we deal with > filenames. It does not make sense to fix this one function, but not > the other ones. > > And it seems to be a memory leak, because py_member needs to be > freed. > > What we could probably do is to write a custom converter that > takes care of the memory management and also works on Python 2. The > encoder would > (a) if needed: convert unicode to bytes using PyUnicode_EncodeFSDefault > (b) create something like the following from the bytes object, where the > a reference to the bytes object is stored in the object member (the > code here is not complete, it's only a quick draft): > > class PyApt_Filename { > public: > PyObject *object; > char *path; > > PyApt_Filename(PyObject *object, char *path) { > this->object = object; > this->path = path; > } > > ~PyApt_Filename() { > Py_DECREF(object); > } > > static int Converter(PyObject *object, void *out) { > PyApt_Filename *real_out = (PyApt_Filename *) out; > > if (PyUnicode_Check(object)) { > object = PyUnicode_EncodeFSDefault(object); > } else { > Py_INCREF(object); > } > > if (!PyBytes_Check(object)) { > ... <set exception> ... > return 0; > } > > real_out->object = object; > real_out->path = PyBytes_AS_STRING(object); > > return 1; > } > } > > (we can also use C++ magic stuff and drop the path member, and > overload an operator for use as char* if we want) > > By storing this on the stack we have automatic memory management > > PyApt_Filename path; > if (PyArg_ParseTuple(args,"O&",PyApt_Filename::Converter, &path) == 0) > .... < use path.path here to access file path > ... > .... cleanup happens automatically ... > > I'll try to implement this tomorrow. >
It took a bit longer, but I implemented this in the (temporary and to be rebased) wip/bytes-path branch of the apt/python-apt.git repository. I pushed it there so everyone can have a look. It's missing test cases currently, and is not fully tested, I'll add those later (definitely before pushing to master). -- Julian Andres Klode - Debian Developer, Ubuntu Member See http://wiki.debian.org/JulianAndresKlode and http://jak-linux.org/. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org