Hi, Great work, thanks! What is the minimal supported Python 3 version?
I would rather have bsoup removed at the moment and maybe added back later. Currently the bsoup and lxml parsers require different preprocessors because their parsers come up with different DOM trees. When I refactored the extractors-attributes in IMDbPY into a separate package (piculet) I went another way: first I try to normalize the HTML code so that parsers will parse it the same way, then I apply the extraction rules. On my tests, piculet worked alright on the IMDb markup. Its syntax is very close to the extractors-attributes syntax in IMDbPY, it supports both Python 2 and Python 3, and its also cleaner and more powerful in what it can express. I can try out incorporating piculet into IMDbPY on another branch and we'll see if that route is worth pursuing. Piculet will use elementtree, or lxml if available. A possible downside is that it might be slower due to the HTML normalization step at the beginning. Regarding tests, I have some work left over from my earlier attempts at porting IMDbPY to Python 3. I will send them as a pull request in a few days. Or I can make them a separate repository like imdbpy-testsuite. Turgut On 11/05/2017 05:51 PM, Davide Alberani wrote: > Hi everyone, > > I've completed a first round of changes into the > https://github.com/alberanid/imdbpy/tree/codename-simplify branch. > > Right now: > - Python 3 is supported, for http parser > - I've simplified the setup.py to always require lxml and only support > SQLAlchemy > > What can be done: > > 1. I've not yet removed bsoup support, and I'm still undecided about it. > To test it, one can just remove the lxml after it was installed. > I assume it's broken, since I've not fixed anything, there, except > what 2to3 has done. > > If it's a simple thing to fix it, I guess we can keep it as a > fallback, otherwise I've no problem introducing the lxml dependency. > > 2. tests, tests, tests. > I've just done some manual tests, and most of the base features seems ok. > If anyone find some problem, please notify us (and/or provide a patch ;-)) > > 3. SQL parser support for Python 3. > I'll work on this in the next weeks. > > 4. > later, I want to see if using "from future import ..." it's possible > to reintroduce support for Python 2.7 > > > > > On Thu, Nov 2, 2017 at 8:46 PM, Davide Alberani > <davide.alber...@gmail.com> wrote: >> On Thu, Nov 2, 2017 at 5:51 PM, H. Turgut Uyar <u...@tekir.org> wrote: >>> >>> A similar thing could be done with respect to the print function. Out of >>> curiosity, do >>> you plan to use 2to3 for this, or do you plan to do it manually? >> >> Both: first round with 2to3, then some manual fixes. >> >> I also compare the changes provided by others in >> https://github.com/alberanid/imdbpy/pull/45 and >> https://github.com/alberanid/imdbpy/pull/39 >> >> The first tests, show that there's hope. ;-) >> >> >> -- >> Davide Alberani <davide.alber...@gmail.com> [PGP KeyID: 0x3845A3D4AC9B61AD] >> http://www.mimante.net/ > > > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Imdbpy-devel mailing list Imdbpy-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-devel