I don't think that the belief network stuff is going to give you much lift.
It also sounds like you don't have many other users looking at the data at
the same time so you probably don't have much crowd-source data to work
with.

My feeling is that you should use a standard machine learning technique such
as SVM or logistic regression to build your model based on whatever features
you have available.  There are a number of ways you could implement this.

One us to simply run R as a server and access it from java.  That would give
you the quickest and simplest way to implement the machine learning part of
what you are doing with the least hassle in your swing app.  Rserve let's
you do this with a minimum of fuss.  I would preprocess the data for your
10,000 instances off-line and store them in the R environment.  Then,
everytime the user does something interesting, you can pass their entire
history to the R server, rerun learning from scratch and then evaluate all
instances again, passing the top few back to your swing program.

Mahout also has a few learning algorithms in preliminary form, but I really
think that you would do better with your small dataset to simply use
standard algorithms in R.

On Tue, Jan 5, 2010 at 8:14 AM, Graham Allan <[email protected]>wrote:

> Hey all,
>
> I've not yet fully dove into using Mahout, but I've been subscribed to this
> list for a few months and the high level of experience and talent is
> obvious.
> For the most part the discussions go way over my head! I'm working on that
> though ;-)
>
> I have a scenario where I want to order a list of instances based on the
> preference of a single user. This will be as part of an existing Java Swing
> application. There is likely to be no more than 10,000 instances, maybe a
> few
> hundred of which the user will wish to manually go through (it's a code
> checking tool for Java). My aim is to accurately present the most preferred
> instances at the start of the list.
>
> I intend to use a Bayesion/belief network to continuously reorder the list
> as
> the user inspects instances. However, before this happens, I wish to have a
> supervised learning session to train the network. For the learning session
> to
> be as productive as possible I wish to order all the instances based on the
> information gain they provide, and have the user classify a small
> percentage
> of these. The instances will have a small number of features, and they may
> or
> may not have correlating features.
>
> This is the scenario that's left me wondering, does Mahout have anything
> which
> can help me here?
>
> This is for a university course, and I'm just introducing myself to machine
> learning techniques, so if my terminology is off, or this scenario is
> fundamentally flawed, I'd really like to hear about that too.
>
> Kind regards,
> Graham
>
>


-- 
Ted Dunning, CTO
DeepDyve

Reply via email to