[jira] [Commented] (BIGTOP-1200) Implement init-hcfs.sh to define filesystem semantics , and refactor init-hdfs.sh into a facade.

jay vyas (JIRA) Sun, 02 Feb 2014 12:22:39 -0800

    [ 
https://issues.apache.org/jira/browse/BIGTOP-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13889045#comment-13889045
 ]


jay vyas commented on BIGTOP-1200:
----------------------------------

Thanks Cos for your feedback  Now my turn to  respond :)

1) Purpose of this patch:  Its good to decouple hadoop services from from hdfs 
semantics, wherever possible.  This will pave the way for using bigtop to 
deploy more than just standard HDFS based hadoop services.  Thats the main 
purpose of this patch.  Passively, it also cleans up some stuff to 
incrementally improve issues like (2) below:

2) Regarding the "partially initialized file systems" :   That is a great point 
! That is actually why we've put in  "mkdir -p" instead of just "mkdir" as 
*part of this patch* :) .  thus, the "partially initialized FS" problem is much 
more flexibly dealt with by the init-hcfs.sh script, than with the original 
init hDfs script.

3) Regarding "very slow performance of init-hdfs": You are right that that your 
idea to use DFS direct APIs could be good for performance .    This is 
synergistic with init-hcfs.sh.... By making a "generic" initi-hcfs.sh script 
(look  closely at the patch, you will see that init-hdfs.sh is now much 
simpler), it paves the ways for you HDFS folks to create optimized HDFS path 
for file creation, but it also contributes an HCFS compliant alternative which 
the HCFS community can use with our bigtop based deployments. 

I think the 3 bullets above are a good start to an important debate that NEEDS 
to happen in the open. 
Lets please keep this debate going.    The dialogue is probably just as 
important as the patch. 

* now in case thats not a compelling argument for this patch, heres an 
alternative approach :) * 

If you *still feel* that having init-hdfs.sh and init-hcfs.sh as side-by-side 
utilities is bad, then maybe i can add init-hcfs.sh into bigtop so that, from 
our side, the broader FileSYstem ecosystem (which ultimately will contirbute 
back and improve HDFS by contributing to the robustness of its interfaces and 
tests), we have a foothold in bigtop upon which we can innovate to further 
diversify the bigtop stack so that it can support a more diverse range of 
hadoop deployments. 

> Implement init-hcfs.sh to define filesystem semantics , and refactor 
> init-hdfs.sh into a facade.
> ------------------------------------------------------------------------------------------------
>
>                 Key: BIGTOP-1200
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-1200
>             Project: Bigtop
>          Issue Type: Improvement
>            Reporter: jay vyas
>         Attachments: BIGTOP-1200.patch
>
>
> One of the really useful artifacts in bigtop is the init-hdfs.sh script.  It 
> defines ecosystem semantics and expectations for hadoop clusters. 
> Other HCFS filesystems can leverage the logic in this script quite easily, if 
> we decouple its implementation from being HDFS specific by specifying a 
> "SUPERUSER" parameter to replace "hdfs".
> And yes we can still have the init-hdfs.sh convenience script : which just 
> calls "init-hcfs.sh hdfs" .
> Initial tests in puppet VMs pass.  (attaching patch with this JIRA)
> {noformat}
> [root@vagrant bigtop-puppet]# ./init-hdfs.sh 
> + echo 'Now initializing the Distributed File System with root=HDFS'
> Now initializing the Distributed File System with root=HDFS
> + ./init-hcfs.sh hdfs
> + '[' 1 -ne 1 ']'
> + SUPER_USER=hdfs
> + echo 'Initializing the DFS with super user : hdfs'
> Initializing the DFS with super user : hdfs
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /tmp'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 1777 /tmp'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /var'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /var/log'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 1775 /var/log'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown yarn:mapred /var/log'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /tmp/hadoop-yarn'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown -R mapred:mapred 
> /tmp/hadoop-yarn'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 777 /tmp/hadoop-yarn'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p 
> /var/log/hadoop-yarn/apps'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 1777 
> /var/log/hadoop-yarn/apps'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown yarn:mapred 
> /var/log/hadoop-yarn/apps'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /hbase'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown hbase:hbase /hbase'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /solr'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown solr:solr /solr'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /benchmarks'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 777 /benchmarks'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod 755 /user'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown hdfs  /user'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user/history'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown mapred:mapred 
> /user/history'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod 755 /user/history'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user/jenkins'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 777 /user/jenkins'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown jenkins /user/jenkins'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user/hive'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 777 /user/hive'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown hive /user/hive'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user/root'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 777 /user/root'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown root /user/root'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user/hue'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 777 /user/hue'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown hue /user/hue'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user/sqoop'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 777 /user/sqoop'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown sqoop /user/sqoop'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user/oozie'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 777 /user/oozie'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown -R oozie /user/oozie'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user/oozie/share'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user/oozie/share/lib'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p 
> /user/oozie/share/lib/hive'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p 
> /user/oozie/share/lib/mapreduce-streaming'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p 
> /user/oozie/share/lib/distcp'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p 
> /user/oozie/share/lib/pig'
> + ls '/usr/lib/hive/lib/*.jar'
> + ls /usr/lib/hadoop-mapreduce/hadoop-streaming-2.0.6-alpha.jar 
> /usr/lib/hadoop-mapreduce/hadoop-streaming.jar
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -put 
> /usr/lib/hadoop-mapreduce/hadoop-streaming*.jar 
> /user/oozie/share/lib/mapreduce-streaming'
> put: 
> `/user/oozie/share/lib/mapreduce-streaming/hadoop-streaming-2.0.6-alpha.jar': 
> File exists
> put: `/user/oozie/share/lib/mapreduce-streaming/hadoop-streaming.jar': File 
> exists
> [root@vagrant bigtop-puppet]# 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (BIGTOP-1200) Implement init-hcfs.sh to define filesystem semantics , and refactor init-hdfs.sh into a facade.

Reply via email to