Hi, On Wed, May 5, 2010 at 8:41 AM, Sean Owen <sro...@gmail.com> wrote:
> You might have to be more specific. Support this is in the context of > what, recommendations, clustering, ? > Classification, clustering, and recommendation are the most important ones. > > You can probably fit such concepts into any framework with enough > cleverness, so in that sense, as a general framework, sure I don't see > why any algorithm couldn't eventually be applied to such data. > > This is a fairly specific kind of data model, so I am not sure if it > would be something explicit supported in some special way. > I'm currently working on a system that implements several non-parametric machine learning techniques to work with multi-relational data (K-Medoids, KNN, etc), and it works quite nicely with data that fits in memory. However, I have some new huge datasets, and I'll probably need to use some kind of parallelization, and Mahout seems a good solution. The main purpose of my email was to see if there's someone else out there working in the same thing as I. >From a quick look at the code, a straightforward solution would be to define a new type of Vector (it wouldn't be a vector in the mathematical sense, just a way to save relational information about an instance), and some DistanceMeasures to work with that vector. Then we could use distance based techniques, such as canopy clustering and k-means. Is there any plans to implement more distance-based (or kernel-based) algorithms, such as SVMs and KNN? Cheers, Pedro > > > On Wed, May 5, 2010 at 1:26 PM, Pedro Oliveira <cpdom...@gmail.com> wrote: > > Hi, > > > > I have a simple question: does Mahout supports, or plans to support, > > multi-relational datasets? > > I.e., datasets where each instance can have a variable number of values > in a > > attribute, and values can be other instances? > > The basic example is a social network, where each person has several > > attributes, and some attributes, like "knows", can have several distinct > > values, and these values are other persons. > > This datasets are usually very sparse (there's lots of distinct > attributes, > > but each instance only has values for few of them), and the relational > > information is very relevant (in the social network example, the > > acquaintances of our acquaintances are relevant). > > > > > > Cheers, > > Pedro > > >