Le 25 févr. 2012 à 16:09, Davide Alberani a écrit : > Yep, I kept the indexes at the minimum... maybe too much. :-)
This is a reasonable approach. Sometimes too much indexes could be worse than no index at all :D > >> I just wanted to know if you're ok pulling this upstream. In the other case >> I'll maintain a patch on my side ! > > No problems at introducing them, but... would not it be easier if you just > join > the IMDbPY project? :-P Fair enough ! I can help you with sql nightmares. To be honest, I only use the parsing and sql importing part of your project, which is by itself awesome :p > If it's ok for you, give me your sourceforge and bitbucket usernames, and > I'll grant you write permission to the repository. I saw that your github repo is just a mirror ? Is it possible to use pull requests on github or this is a read only repo ? Point is, the process is easier on github to propose path and discuss it. If it's not possible, i'll create a sourceforge account :) > It's ok even if you don't really have much time to dedicate to it: when > you need something, just ask to me (to be sure that it will not create some > strange problems that only me can know) and commit. :-) I don't have much time, my startup eats a lot of my time ... But your project has an important part in mine, so if I can share my optims be sure I'll will commit them ! I'll probably make one of my own project public one day. It's python, it imports imdb with imdbpy add wikipedia synospys and bio(you can target language), freebase, themoviedb, thetvdb (etc) to have an universal and automatized movie database. For now, the code is too dirty to be public :D > > Point is, IMDbPY was always a playground, for me (come on... we didn't > REALLY need to support both SQLObject and SQLAlchemy ORMs ;-) and I'm not > an expert on databases. > > On the db, actually, I'd like to see this work done: > 1. an overall evaluation of the current status: is the structure still > valid/meaningful? > (keeping in mind that many choices were due to performance reasons - maybe > with wrong assumptions on my side). > 2. introduce indexes/foreign keys were needed. > 3. analyze if it's possible to get rid of the movie_info_idx tables: > it's exactly > the same as movie_info, but it contains only the rating information, and > more indexes are created on them (point is: does creating these indexes > for movie_info really waste so much spaces that it's not worth, or was it > all in my head? ;-) Your choices are mainly smart, the schema isn't so bad ;-) The nature of the data is complex with a lot of relations, and for performance reasons I think the denormalization of movie_info_idx isn't a bad thing. Let me know for the github thing :) > > -- > Davide Alberani <davide.alber...@gmail.com> [PGP KeyID: 0x465BFD47] > http://www.mimante.net/ ------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ Imdbpy-devel mailing list Imdbpy-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-devel