Clustering from DB

nfantone Fri, 26 Jun 2009 07:21:28 -0700

Hi to you all, Mahout users. I'm new to the list and to Mahout itself
and I'm trying to integrate Taste to my project in which I need to
cluster user data from a very large data set, based on their behavior
which is stored in some tables in a local data base. From what I've
read and experimented, clustering in Mahout takes advantage of HDFS
and Lucene indexing, converting plain CSV files to Vectors. So, I ask:
is it mandatory to create plain text files (or HDFS files) and indexes
from the data in my DB so as to feed clustering algorithm's input?
Couldn't I create, somehow, the Vectors directly and then use them to
initiate the clustering jobs? Is there any convenient way to achieve
this? I've not seen anything similar to the "DataModel" interface used
by Recommenders for JDBC connection (or any other connectivity API)
and the runJob static methods receive paths for both input and output
which, a priori, I don't have any use for. Documentation wasn't
helpful either as the "From a Database" section of "Creating Vectors
from Text" is currently empty.


Any kind of help would be appreciated. If I manage to get this to
work, I would be more than glad to help with the Documentation or Wiki
and share it.

Clustering from DB

Reply via email to