Hi Xun,

I posted my SoC review to Google and I expect the process over there to be expedited swiftly. I think we have now thanks to your work several interesting things:
- a working parcel to make experiments
- a set of data performed on Slashdot feeds showing that auto tagging after a learning phase is feasible
- a path to incorporate PCA into Chandler (using MDP)

I've been discussing with vikSIT on IRC what should be the next steps. Eventually, the goal will be to have auto tagging turned on (optionally) on the trunk but we won't even see that in Chandler before we have a tagging feature in (planned for alpha6). In the meantime, there's a set of short term things we should do: 1- clean up http://wiki.osafoundation.org/bin/view/Journal/InternProjectMVA : I posted a set of comments on this page back in July but they still haven't be properly incorporated (assuming I was right). I think you should do that Xun. 2- update the parcel to be running against the 0.7alpha4 trunk code: right now, it runs against 0.7alpha1 which is kind of old. I know that the egg stuff sort of threw a wrench in your plan but we should get pass that hurdle. May be vikSIT or other contributors (Markku?) could help there. 4- incorporate MDP library: we need to find a place where to park this and start to use it. Classically in Chandler we download tarballs and create a specific Makefile for such projects under /external. That's what we do for icu, PyLucene, twisted and the like so it looks like we should do just the same for MDP. May be bear can help / give us some advice here. 5- make the relevant changes in the MVA project to call the MDP library: the first thing will be to compute the eigenvectors correctly. Right now the code simply compute an average vector (cumulative, non normalized) per tag (till we reach a given threshold) then project the new vectors on them. The threshold for accumulation is not grounded into data, the threshold for attribution (when projecting) is not grounded into data either, we run no analysis to which dimension contribute to variance or not. We need to improve all of that and use the MDP calls for that.

For the time being, we'll continue to use sandbox/xluo as a repository for this code. Eventually, we'll want to move that to chandler/projects once we prove that it's worthwhile but, for the moment, it would be a drag on the project to maintain that code off the trunk (it would be submitted to all QA/build/release engineering constraints we have for everything that lands on the trunk) and most of the would be contributors don't have svn trunk privileges anyway. However working with patches off several sandboxes will be annoying so I'm proposing that we consider sandbox/xluo as the official repo for MVA.

Also, I propose we continue with the Slashdot experiment you started though, clearly, we'll have to grow out of it in a while but I see no reason to do that as long as we don't have a better MVA model.

What do you guys think?
BTW, just as a roll call, who on this list would be interested to contribute to this project moving forward?

Cheers,
- Philippe
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev

Reply via email to