Hello again, I've created a new repository which contains some tests I had written for the HTTP movie combined page parser. Most of the 70+ tests pass for Python 3.3 to 3.6 with and without lxml installed. Pretty good start.
https://github.com/uyar/imdbpy-tests To run, just type "tox". This assumes that you have the python3.3, python3.4, python3.5 and python3.6 executables in your path. If you want, you can test only one environment by using it like "tox -e py35". When it downloads a page from IMDb, it will cache it in the directory tests/.cache and on subsequent runs it will not download the page again. More tests are definitely welcome. Bye, -- Turgut On 11/05/2017 09:16 PM, H. Turgut Uyar wrote: > Hi, > > Great work, thanks! What is the minimal supported Python 3 version? > > I would rather have bsoup removed at the moment and maybe added back > later. Currently the bsoup and lxml parsers require different > preprocessors because their parsers come up with different DOM trees. > When I refactored the extractors-attributes in IMDbPY into a separate > package (piculet) I went another way: first I try to normalize the HTML > code so that parsers will parse it the same way, then I apply the > extraction rules. On my tests, piculet worked alright on the IMDb > markup. Its syntax is very close to the extractors-attributes syntax in > IMDbPY, it supports both Python 2 and Python 3, and its also cleaner and > more powerful in what it can express. I can try out incorporating > piculet into IMDbPY on another branch and we'll see if that route is > worth pursuing. Piculet will use elementtree, or lxml if available. A > possible downside is that it might be slower due to the HTML > normalization step at the beginning. > > Regarding tests, I have some work left over from my earlier attempts at > porting IMDbPY to Python 3. I will send them as a pull request in a few > days. Or I can make them a separate repository like imdbpy-testsuite. > > Turgut > > On 11/05/2017 05:51 PM, Davide Alberani wrote: >> Hi everyone, >> >> I've completed a first round of changes into the >> https://github.com/alberanid/imdbpy/tree/codename-simplify branch. >> >> Right now: >> - Python 3 is supported, for http parser >> - I've simplified the setup.py to always require lxml and only support >> SQLAlchemy >> >> What can be done: >> >> 1. I've not yet removed bsoup support, and I'm still undecided about it. >> To test it, one can just remove the lxml after it was installed. >> I assume it's broken, since I've not fixed anything, there, except >> what 2to3 has done. >> >> If it's a simple thing to fix it, I guess we can keep it as a >> fallback, otherwise I've no problem introducing the lxml dependency. >> >> 2. tests, tests, tests. >> I've just done some manual tests, and most of the base features seems ok. >> If anyone find some problem, please notify us (and/or provide a patch ;-)) >> >> 3. SQL parser support for Python 3. >> I'll work on this in the next weeks. >> >> 4. >> later, I want to see if using "from future import ..." it's possible >> to reintroduce support for Python 2.7 >> >> >> >> >> On Thu, Nov 2, 2017 at 8:46 PM, Davide Alberani >> <davide.alber...@gmail.com> wrote: >>> On Thu, Nov 2, 2017 at 5:51 PM, H. Turgut Uyar <u...@tekir.org> wrote: >>>> >>>> A similar thing could be done with respect to the print function. Out of >>>> curiosity, do >>>> you plan to use 2to3 for this, or do you plan to do it manually? >>> >>> Both: first round with 2to3, then some manual fixes. >>> >>> I also compare the changes provided by others in >>> https://github.com/alberanid/imdbpy/pull/45 and >>> https://github.com/alberanid/imdbpy/pull/39 >>> >>> The first tests, show that there's hope. ;-) >>> >>> >>> -- >>> Davide Alberani <davide.alber...@gmail.com> [PGP KeyID: 0x3845A3D4AC9B61AD] >>> http://www.mimante.net/ >> >> >> > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Imdbpy-devel mailing list > Imdbpy-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/imdbpy-devel > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Imdbpy-devel mailing list Imdbpy-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-devel