[
https://issues.apache.org/jira/browse/BIGTOP-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13889045#comment-13889045
]
jay vyas commented on BIGTOP-1200:
----------------------------------
Thanks Cos for your feedback Now my turn to respond :)
1) Purpose of this patch: Its good to decouple hadoop services from from hdfs
semantics, wherever possible. This will pave the way for using bigtop to
deploy more than just standard HDFS based hadoop services. Thats the main
purpose of this patch. Passively, it also cleans up some stuff to
incrementally improve issues like (2) below:
2) Regarding the "partially initialized file systems" : That is a great point
! That is actually why we've put in "mkdir -p" instead of just "mkdir" as
*part of this patch* :) . thus, the "partially initialized FS" problem is much
more flexibly dealt with by the init-hcfs.sh script, than with the original
init hDfs script.
3) Regarding "very slow performance of init-hdfs": You are right that that your
idea to use DFS direct APIs could be good for performance . This is
synergistic with init-hcfs.sh.... By making a "generic" initi-hcfs.sh script
(look closely at the patch, you will see that init-hdfs.sh is now much
simpler), it paves the ways for you HDFS folks to create optimized HDFS path
for file creation, but it also contributes an HCFS compliant alternative which
the HCFS community can use with our bigtop based deployments.
I think the 3 bullets above are a good start to an important debate that NEEDS
to happen in the open.
Lets please keep this debate going. The dialogue is probably just as
important as the patch.
* now in case thats not a compelling argument for this patch, heres an
alternative approach :) *
If you *still feel* that having init-hdfs.sh and init-hcfs.sh as side-by-side
utilities is bad, then maybe i can add init-hcfs.sh into bigtop so that, from
our side, the broader FileSYstem ecosystem (which ultimately will contirbute
back and improve HDFS by contributing to the robustness of its interfaces and
tests), we have a foothold in bigtop upon which we can innovate to further
diversify the bigtop stack so that it can support a more diverse range of
hadoop deployments.
> Implement init-hcfs.sh to define filesystem semantics , and refactor
> init-hdfs.sh into a facade.
> ------------------------------------------------------------------------------------------------
>
> Key: BIGTOP-1200
> URL: https://issues.apache.org/jira/browse/BIGTOP-1200
> Project: Bigtop
> Issue Type: Improvement
> Reporter: jay vyas
> Attachments: BIGTOP-1200.patch
>
>
> One of the really useful artifacts in bigtop is the init-hdfs.sh script. It
> defines ecosystem semantics and expectations for hadoop clusters.
> Other HCFS filesystems can leverage the logic in this script quite easily, if
> we decouple its implementation from being HDFS specific by specifying a
> "SUPERUSER" parameter to replace "hdfs".
> And yes we can still have the init-hdfs.sh convenience script : which just
> calls "init-hcfs.sh hdfs" .
> Initial tests in puppet VMs pass. (attaching patch with this JIRA)
> {noformat}
> [root@vagrant bigtop-puppet]# ./init-hdfs.sh
> + echo 'Now initializing the Distributed File System with root=HDFS'
> Now initializing the Distributed File System with root=HDFS
> + ./init-hcfs.sh hdfs
> + '[' 1 -ne 1 ']'
> + SUPER_USER=hdfs
> + echo 'Initializing the DFS with super user : hdfs'
> Initializing the DFS with super user : hdfs
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /tmp'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 1777 /tmp'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /var'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /var/log'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 1775 /var/log'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown yarn:mapred /var/log'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /tmp/hadoop-yarn'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown -R mapred:mapred
> /tmp/hadoop-yarn'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 777 /tmp/hadoop-yarn'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p
> /var/log/hadoop-yarn/apps'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 1777
> /var/log/hadoop-yarn/apps'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown yarn:mapred
> /var/log/hadoop-yarn/apps'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /hbase'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown hbase:hbase /hbase'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /solr'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown solr:solr /solr'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /benchmarks'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 777 /benchmarks'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod 755 /user'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown hdfs /user'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user/history'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown mapred:mapred
> /user/history'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod 755 /user/history'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user/jenkins'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 777 /user/jenkins'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown jenkins /user/jenkins'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user/hive'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 777 /user/hive'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown hive /user/hive'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user/root'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 777 /user/root'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown root /user/root'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user/hue'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 777 /user/hue'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown hue /user/hue'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user/sqoop'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 777 /user/sqoop'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown sqoop /user/sqoop'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user/oozie'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 777 /user/oozie'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown -R oozie /user/oozie'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user/oozie/share'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /user/oozie/share/lib'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p
> /user/oozie/share/lib/hive'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p
> /user/oozie/share/lib/mapreduce-streaming'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p
> /user/oozie/share/lib/distcp'
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p
> /user/oozie/share/lib/pig'
> + ls '/usr/lib/hive/lib/*.jar'
> + ls /usr/lib/hadoop-mapreduce/hadoop-streaming-2.0.6-alpha.jar
> /usr/lib/hadoop-mapreduce/hadoop-streaming.jar
> + su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -put
> /usr/lib/hadoop-mapreduce/hadoop-streaming*.jar
> /user/oozie/share/lib/mapreduce-streaming'
> put:
> `/user/oozie/share/lib/mapreduce-streaming/hadoop-streaming-2.0.6-alpha.jar':
> File exists
> put: `/user/oozie/share/lib/mapreduce-streaming/hadoop-streaming.jar': File
> exists
> [root@vagrant bigtop-puppet]#
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)