[jira] [Updated] (MAHOUT-1579) Implement a datamodel which can load data from hadoop filesystem directly

Xiaomeng Huang (JIRA) Thu, 12 Jun 2014 22:11:21 -0700

     [ 
https://issues.apache.org/jira/browse/MAHOUT-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Xiaomeng Huang updated MAHOUT-1579:
-----------------------------------

    Description: 
As we all know, FileDataModel can only load data from local filesystem.
But the big-data are usually stored in hadoop filesystem(e.g. hdfs).
If we want to deal with the data in hdfs, we must run mapred job. 
It's necessay to implement a data model which can load data from hadoop 
filesystem directly.

  was:
As we all know, FileDataModel can only load data from local filesystem.
But the big-data are usually stored in hadoop filesystem(e.g. hdfs).
If we want to deal with the data in hdfs, we must run mapred job. 
And the distributed algorithm can only process data form like [userID: ItemID1, 
ItemID2, ItemID3...]
It's necessay to implement a data model which can load data from hadoop 
filesystem directly.
If the data is not very large, we can use this data model and process data form 
like [userID,itemID,preference]


> Implement a datamodel which can load data from hadoop filesystem directly
> -------------------------------------------------------------------------
>
>                 Key: MAHOUT-1579
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1579
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Xiaomeng Huang
>            Priority: Minor
>         Attachments: Mahout-1579.patch
>
>
> As we all know, FileDataModel can only load data from local filesystem.
> But the big-data are usually stored in hadoop filesystem(e.g. hdfs).
> If we want to deal with the data in hdfs, we must run mapred job. 
> It's necessay to implement a data model which can load data from hadoop 
> filesystem directly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAHOUT-1579) Implement a datamodel which can load data from hadoop filesystem directly

Reply via email to