jay vyas created BIGTOP-1177:
--------------------------------

             Summary: Puppet Recipes: Can we modularize them?
                 Key: BIGTOP-1177
                 URL: https://issues.apache.org/jira/browse/BIGTOP-1177
             Project: Bigtop
          Issue Type: Improvement
            Reporter: jay vyas


In the spirit of interoperability Can we work to modularizing the bigtop puppet 
recipes to not define "hadoop_cluster_node" as an HDFS specific class.  

I'm not a puppet expert but just 

Here are two reasons why:

- For  HDFS USers: In some use cases we might want to use bigtop to provision 
many nodes, only some of which are "data nodes".    For example: Lets say our 
cluster is crawling the web in mappers, and doing some machine learning and 
distillling large pages into  a small relational database tuple, i.e. that 
summarizes the "entities" in the page.  In this case we don't necessarily 
benefit much from locality because we might be CPU rather than network/io 
bound.   So we might want to provision a cluster of 50 machines : 40 multicore 
CPU heavy ones and just 10 datanodes to support the DFS.   I know this is an 
extreme case but its a good example.  

- For NON-HDFS users: One important aspect of emerging hadoop workflows is HCFS 
: https://wiki.apache.org/hadoop/HCFS/ -- the idea that filesystems like S3, 
OrangeFS, GlusterFileSystem, etc.. are all just as capable , although not 
necessarily optimal, of supporting YARN and Hadoop operations as HDFS.   



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to