[Chandler-dev] [MVA] Next steps after SoC project

Philippe Bossut Tue, 29 Aug 2006 15:32:30 -0700

Hi Xun,

I posted my SoC review to Google and I expect the process over there tobe expedited swiftly. I think we have now thanks to your work severalinteresting things:

- a working parcel to make experiments

- a set of data performed on Slashdot feeds showing that auto taggingafter a learning phase is feasible

- a path to incorporate PCA into Chandler (using MDP)

I've been discussing with vikSIT on IRC what should be the next steps.Eventually, the goal will be to have auto tagging turned on (optionally)on the trunk but we won't even see that in Chandler before we have atagging feature in (planned for alpha6). In the meantime, there's a setof short term things we should do:1- clean uphttp://wiki.osafoundation.org/bin/view/Journal/InternProjectMVA : Iposted a set of comments on this page back in July but they stillhaven't be properly incorporated (assuming I was right). I think youshould do that Xun.2- update the parcel to be running against the 0.7alpha4 trunk code:right now, it runs against 0.7alpha1 which is kind of old. I know thatthe egg stuff sort of threw a wrench in your plan but we should get passthat hurdle. May be vikSIT or other contributors (Markku?) could help there.4- incorporate MDP library: we need to find a place where to park thisand start to use it. Classically in Chandler we download tarballs andcreate a specific Makefile for such projects under /external. That'swhat we do for icu, PyLucene, twisted and the like so it looks like weshould do just the same for MDP. May be bear can help / give us someadvice here.5- make the relevant changes in the MVA project to call the MDP library:the first thing will be to compute the eigenvectors correctly. Right nowthe code simply compute an average vector (cumulative, non normalized)per tag (till we reach a given threshold) then project the new vectorson them. The threshold for accumulation is not grounded into data, thethreshold for attribution (when projecting) is not grounded into dataeither, we run no analysis to which dimension contribute to variance ornot. We need to improve all of that and use the MDP calls for that.

For the time being, we'll continue to use sandbox/xluo as a repositoryfor this code. Eventually, we'll want to move that to chandler/projectsonce we prove that it's worthwhile but, for the moment, it would be adrag on the project to maintain that code off the trunk (it would besubmitted to all QA/build/release engineering constraints we have foreverything that lands on the trunk) and most of the would becontributors don't have svn trunk privileges anyway. However workingwith patches off several sandboxes will be annoying so I'm proposingthat we consider sandbox/xluo as the official repo for MVA.

Also, I propose we continue with the Slashdot experiment you startedthough, clearly, we'll have to grow out of it in a while but I see noreason to do that as long as we don't have a better MVA model.


What do you guys think?

BTW, just as a roll call, who on this list would be interested tocontribute to this project moving forward?


Cheers,
- Philippe
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev

[Chandler-dev] [MVA] Next steps after SoC project

Reply via email to