Re: [Imdbpy-devel] IMDbPY revamp

H. Turgut Uyar Sun, 05 Nov 2017 10:17:39 -0800

Hi,

Great work, thanks! What is the minimal supported Python 3 version?

I would rather have bsoup removed at the moment and maybe added back
later. Currently the bsoup and lxml parsers require different
preprocessors because their parsers come up with different DOM trees.
When I refactored the extractors-attributes in IMDbPY into a separate
package (piculet) I went another way: first I try to normalize the HTML
code so that parsers will parse it the same way, then I apply the
extraction rules. On my tests, piculet worked alright on the IMDb
markup. Its syntax is very close to the extractors-attributes syntax in
IMDbPY, it supports both Python 2 and Python 3, and its also cleaner and
more powerful in what it can express. I can try out incorporating
piculet into IMDbPY on another branch and we'll see if that route is
worth pursuing. Piculet will use elementtree, or lxml if available. A
possible downside is that it might be slower due to the HTML
normalization step at the beginning.

Regarding tests, I have some work left over from my earlier attempts at
porting IMDbPY to Python 3. I will send them as a pull request in a few
days. Or I can make them a separate repository like imdbpy-testsuite.

Turgut

On 11/05/2017 05:51 PM, Davide Alberani wrote:
> Hi everyone,
> 
> I've completed a first round of changes into the
> https://github.com/alberanid/imdbpy/tree/codename-simplify branch.
> 
> Right now:
> - Python 3 is supported, for http parser
> - I've simplified the setup.py to always require lxml and only support
> SQLAlchemy
> 
> What can be done:
> 
> 1. I've not yet removed bsoup support, and I'm still undecided about it.
> To test it, one can just remove the lxml after it was installed.
> I assume it's broken, since I've not fixed anything, there, except
> what 2to3 has done.
> 
> If it's a simple thing to fix it, I guess we can keep it as a
> fallback, otherwise I've no problem introducing the lxml dependency.
> 
> 2. tests, tests, tests.
> I've just done some manual tests, and most of the base features seems ok.
> If anyone find some problem, please notify us (and/or provide a patch ;-))
> 
> 3. SQL parser support for Python 3.
> I'll work on this in the next weeks.
> 
> 4.
> later, I want to see if using "from future import ..." it's possible
> to reintroduce support for Python 2.7
> 
> 
> 
> 
> On Thu, Nov 2, 2017 at 8:46 PM, Davide Alberani
> <[email protected]> wrote:
>> On Thu, Nov 2, 2017 at 5:51 PM, H. Turgut Uyar <[email protected]> wrote:
>>>
>>> A similar thing could be done with respect to the print function. Out of 
>>> curiosity, do
>>> you plan to use 2to3 for this, or do you plan to do it manually?
>>
>> Both: first round with 2to3, then some manual fixes.
>>
>> I also compare the changes provided by others in
>> https://github.com/alberanid/imdbpy/pull/45 and
>> https://github.com/alberanid/imdbpy/pull/39
>>
>> The first tests, show that there's hope. ;-)
>>
>>
>> --
>> Davide Alberani <[email protected]>  [PGP KeyID: 0x3845A3D4AC9B61AD]
>> http://www.mimante.net/
> 
> 
> 

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Imdbpy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel

Re: [Imdbpy-devel] IMDbPY revamp

Reply via email to