Hi Viksit,
>> I will be happy to get suggestions and patches from you once I get
>> the
>> initial release ready.
>
> Great. So if I interpret the current work in progress rightly - you're
> going to be releasing Xun's code into one of the releases, which can
> then be used as a platform for further enhancements to the project?
Xun's work is feeds parcel specific and we aim to go Chandler wide
right from the start. As the number of affected code lines in Xun's
parcel is also small, I can see little value in porting it to newer
Chandler versions. Instead, I plan to go for the real real thing as
soon as possible.
>> I have opened a new discussion in this forum for the selection
>> computational platform. I suggested using SciPy (or just NumPy)
>> and as
>> there has not been any feedback I think that we are going to include
>> that into our Chandler distribution. Implementing the necessary MVA
>> operations is an easy task (and we can always use other complementary
>> libraries if necessary).
>
> Right, I meant one of the above. So the final thing here would be to
> include SciPy into Chandler.
I would say that this is one of the first things to do ;)
>> MY CURRENT TODO LIST:
> BTW - I'm looking at some stuff here right now, but is there
> anything in
> specific which I might look at?
>
Learn to use Lucene and PyLucene. We plan to use it, at least to some
degree, in this project. It is an interesting open source project
with applications outside ours as well so you cannot lose. Also you
might want to refresh your statistical skills like how to make LDA,
QDA, PCA etc. with matrix operations (You will learn to like singular
value decomposition here).
>> [preliminary work]
>> 1) Learn how to use PyLucene (I am currently reading a book:
>> Lucene in
>> Action).
>
> Right, I've been experimenting with PyLucene myself. Got that book
> too,
> infact.
Great. Like I already said now is the time to complete ones training
on the use of this library.
>> 2) Obtain a real world data set with tags (Philippe has agreed to
>> prepare one).
>
> I see. Could you elaborate on this point a bit? By real world dataset
> with tags - you mean already tagged email with relevant content
> that we
> would use to train the system?
Yes. We plan to make an empirical data set that contains over a
thousand real world emails with human assigned tags. This data set is
then partitioned into train, test, and validation data sets, with
bootstrapping, to perform statistical inference. This is the only way
to assess how the developed system will perform in a real world
setting. Simulated data sets tend to yield misleading results...
>> 3) Implement necessary MVA operations (model building, clustering,
>> etc).
>> 4) Play with the empirical data to see how well the system actually
>> works (I am going to make a feasibility study).
>> 5) Get tagging implemented in Chandler (I need to discuss about this
>> with Grant next week when he is back here in the office).
>
> Right.
>
>> 6) Select a computational platform that is capable with matrix
>> operations and have it included into our Chandler distribution (I
>> will
>> discuss about this with Bear and Heikki next week).
>> [after 1-6 have been taken care of]
>> 7) Decide the best way to implement automatic tagging
>> functionality in
>> Chandler.
>> 8) Make all the necessary changes to repository schema, GUI, etc...
>
>> There is plenty of work to do so I really hope that we get a quick
>> start
>> on items 1-6.
>
> Indeed. The sooner, the better :) I'd be happy to discuss further
> on IRC
> or email about which tasks in particular need to be taken care of
> right
> now.
I appreciate your offer and will pitch ideas to you as soon as there
is something to start working with. For example once we get a data
set and a first version of the system working I am sure that there is
a lot of work in evaluating different models and in fine tuning their
parameters. This is also the most fun part of the work ;) At the
moment it is the lack of a data set is holding us back. On Monday I
will open a new wiki page for the project to which I will start to
document all developments as they materialize. You should talk to
Philippe if it is possible for you to have writing rights to this
wiki page as this would make our collaboration so much easier.
Cheers,
Markku
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev