Hi to you all, Mahout users. I'm new to the list and to Mahout itself and I'm trying to integrate Taste to my project in which I need to cluster user data from a very large data set, based on their behavior which is stored in some tables in a local data base. From what I've read and experimented, clustering in Mahout takes advantage of HDFS and Lucene indexing, converting plain CSV files to Vectors. So, I ask: is it mandatory to create plain text files (or HDFS files) and indexes from the data in my DB so as to feed clustering algorithm's input? Couldn't I create, somehow, the Vectors directly and then use them to initiate the clustering jobs? Is there any convenient way to achieve this? I've not seen anything similar to the "DataModel" interface used by Recommenders for JDBC connection (or any other connectivity API) and the runJob static methods receive paths for both input and output which, a priori, I don't have any use for. Documentation wasn't helpful either as the "From a Database" section of "Creating Vectors from Text" is currently empty.
Any kind of help would be appreciated. If I manage to get this to work, I would be more than glad to help with the Documentation or Wiki and share it.
