[ 
https://issues.apache.org/jira/browse/HADOOP-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HADOOP-4304:
----------------------------------

    Status: Open  (was: Patch Available)

I love the short word count example! Looking through the patch, there are a 
couple of issues:

1. The install can't require root privs.
2. It shouldn't install dumbo into python's system directory. It should instead 
use the distributed cache (including creating symlinks into the cwd) and use it 
from there.
3. ant package copies directories into the build/hadoop-0.20-dev/contrib/dumbo 
directory, but it doesn't seem like the right stuff. You don't need the classes 
or test directory. The example directory seems to be empty.

> Add Dumbo to contrib
> --------------------
>
>                 Key: HADOOP-4304
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4304
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Klaas Bosteels
>            Assignee: Klaas Bosteels
>            Priority: Minor
>         Attachments: hadoop-4304-v2.patch, hadoop-4304-v3.patch, 
> hadoop-4304.patch
>
>
> Originally, Dumbo was a simple Python module developed at Last.fm to make 
> writing and running Hadoop Streaming programs very easy, but now it also 
> consists of some (up till now unreleased) helper code in Java (although it 
> can still be used without the Java code). We propose to add Dumbo to 
> "src/contrib" such that the Java classes get build/installed together with 
> the rest of Hadoop, and the Python module can be installed separately at 
> will. A tar.gz of the directory that would have to be added to "src/contrib" 
> is available at
> http://static.last.fm/dumbo/dumbo-contrib.tar.gz
> and more info about Dumbo can be found here:
> * Basic documentation: http://github.com/klbostee/dumbo/wikis
> * Presentation at HUG (where it was first suggested to add Dumbo to contrib): 
> http://skillsmatter.com/podcast/home/dumbo-hadoop-streaming-made-elegant-and-easy
> * Initial announcement: 
> http://blog.last.fm/2008/05/29/python-hadoop-flying-circus-elephant
> For some of the more advanced features of Dumbo (in particular the ones for 
> which the Java classes are needed) there is no public documentation yet, but 
> we could easily fill that gap by moving some of the internal Last.fm 
> documentation to the Hadoop wiki.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to