> > Well, these are mostly class attributes so it shouldn't matter. > > What's worse is that we have 3 different parse tree incarnations, > which we'll probably merge (again changing the API). > > You'll probably have to wait for a 1.0 version if you want API > stability, sorry. > > - Ralf
Yes, I noticed multiple incarnations in your repository after writing my email, and was curious about the aims of mwlib-tidy. I don't understand why there is a need for so many class variables such as "thumb" and "langlink" to be part of the __dict__ of every node. As far as I can tell, .langlink is only used in __repr__; surely there is a neater way to do this. Ideally, I would not only like to see parse tree properties simplified for increased usability, but also that if I wanted to pickle the parse trees, they wouldn't be excessively enormous. (Currently pickled-and-zipped parses of English Wikipedia take up 40GB using an mwlib from last year.) If there's anything I can do to help simplify the parse nodes, I'm willing to help out. But I'm afraid of doing much, precisely because there seem to be too many incarnations at the moment. - Joel --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mwlib" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/mwlib?hl=en -~----------~----~----~----~------~----~------~--~---
