[Hadoop Wiki] Update of "GitAndHadoop" by ArpitAgarwal

Apache Wiki Thu, 06 Sep 2018 13:50:08 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "GitAndHadoop" page has been changed by ArpitAgarwal:
https://wiki.apache.org/hadoop/GitAndHadoop?action=diff&rev1=25&rev2=26

Comment:
Removing content and leaving link to cwiki where new content resides.

- = Git And Hadoop =
+ Content moved to 
https://cwiki.apache.org/confluence/display/HADOOP/Git+And+Hadoop
  
- A lot of people use Git with Hadoop because they have their own patches to 
make to Hadoop, and Git helps them manage it.
+ Please email [email protected] for cwiki access.
  
-  * GitHub provide some good lessons on git at [[http://learn.github.com]]
-  * Apache serves up read-only Git versions of their source at 
[[http://git.apache.org/]]. Committers can commit changes to writable Git 
repository. See HowToCommit
- 
- This page tells you how to work with Git. See HowToContribute for 
instructions on building and testing Hadoop.
- <<TableOfContents(4)>>
- 
- 
- == Key Git Concepts ==
- The key concepts of Git.
- 
-  * Git doesn't store changes, it snapshots the entire source tree. Good for 
fast switch and rollback, bad for binaries. (as an enhancement, if a file 
hasn't changed, it doesn't re-replicate it).
-  * Git stores all "events" as SHA1 checksummed objects; you have deltas, tags 
and commits, where a commit describes the status of items in the tree.
-  * Git is very branch centric; you work in your own branch off local or 
central repositories
-  * You had better enjoy merging.
- 
- 
- == Checking out the source ==
- 
- You need a copy of git on your system. Some IDEs ship with Git support; this 
page assumes you are using the command line.
- 
- Clone a local Git repository from the Apache repository. The Hadoop 
subprojects (common, HDFS, and MapReduce) live inside a combined repository 
called `hadoop.git`.
- 
- {{{
- git clone git://git.apache.org/hadoop.git
- }}}
- 
- '''Committers:''' for read/write access use 
- {{{
- https://git-wip-us.apache.org/repos/asf/hadoop.git
- }}}
- 
- The total download is a few hundred MB, so the initial checkout process works 
best when the network is fast. Once downloaded, Git works offline -though you 
will need to perform your initial builds online so that the build tools can 
download dependencies.
- 
- == Grafts for complete project history ==
- 
- The Hadoop project has undergone some movement in where its component parts 
have been versioned. Because of that, commands like `git log --follow` needs to 
have a little help. To graft the history back together into a coherent whole, 
insert the following contents into `hadoop/.git/info/grafts`:
- 
- {{{
- # Project split
- 5128a9a453d64bfe1ed978cf9ffed27985eeef36 
6c16dc8cf2b28818c852e95302920a278d07ad0c
- 6a3ac690e493c7da45bbf2ae2054768c427fd0e1 
6c16dc8cf2b28818c852e95302920a278d07ad0c
- 546d96754ffee3142bcbbf4563c624c053d0ed0d 
6c16dc8cf2b28818c852e95302920a278d07ad0c
- # Project un-split in new writable git repo
- a196766ea07775f18ded69bd9e8d239f8cfd3ccc 
928d485e2743115fe37f9d123ce9a635c5afb91a
- cd66945f62635f589ff93468e94c0039684a8b6d 
77f628ff5925c25ba2ee4ce14590789eb2e7b85b
- }}}
- 
- You can then use commands like `git blame --follow` with success.
- 
- == Forking onto GitHub ==
- 
- You can create your own fork of the ASF project. This is required if you want 
to contribute patches by submitting pull requests. However you can choose to 
skip this step and attach patch files directly on Apache Jiras.
- 
-  1. Create a GitHub login at http://github.com/ ; Add your public SSH keys
-  1. Go to https://github.com/apache/hadoop/
-  1. Click fork in the github UI. This gives you your own repository URL.
-  1. In the existing clone, add the new repository: 
-  {{{git remote add -f github [email protected]:MYUSERNAMEHERE/hadoop.git}}}
- 
- This gives you a local repository with two remote repositories: {{{origin}}} 
and {{{github}}}. {{{origin}}} has the Apache branches, which you can update 
whenever you want to get the latest ASF version:
- 
- {{{
-  git checkout -b trunk origin/trunk
-  git pull origin
- }}}
- 
- Your own branches can be merged with trunk, and pushed out to GitHub. To 
generate patches for attaching to Apache JIRAs, check everything in to your 
specific branch, merge that with (a recently pulled) trunk, then diff the two:
- {{{ git diff trunk > ../hadoop-patches/HADOOP-XYX.patch }}}
- 
- 
- == Branching ==
- 
- Git makes it easy to branch. The recommended process for working with Apache 
projects is: one branch per JIRA issue. That makes it easy to isolate 
development and track the development of each change. It does mean if you have 
your own branch that you release, one that merges in more than one issue, you 
have to invest some effort in merging everything in. Try not to make changes in 
different branches that are hard to merge, and learn your way round the git 
rebase command to handle changes across branches. Better yet: do not use rebase 
once you have created a chain of branches that each depend on each other
- 
- === Creating the branch ===
- 
- Creating a branch is quick and easy
- {{{
- #start off in the apache trunk
- git checkout trunk
- #create a new branch from trunk
- git branch HDFS-775
- #switch to it
- git checkout HDFS-775
- #show what's branch you are in
- git branch
- }}}
- 
- Remember, this branch is local to your machine. Nobody else can see it until 
you push up your changes or generate a patch, or you make your machine visible 
over the network to interested parties.
- 
- 
- == Creating Patches for attachment to JIRA issues ==
- 
- Assuming your trunk repository is in sync with the Apache projects, you can 
use {{{git diff}}} to create a patch file.
- First, have a directory for your patches:
- {{{
- mkdir ../hadoop-patches
- }}}
- Then generate a patch file listing the differences between your trunk and 
your branch
- {{{
- git diff --no-prefix trunk > ../hadoop-patches/HDFS-775-1.patch
- }}}
- The patch file is an extended version of the unified patch format used by 
other tools; type {{{git help diff}}} to get more details on it. Here is what 
the patch file in this example looks like
- {{{
- cat ../outgoing/HDFS-775-1.patch
- diff --git src/java/org/apache/hadoop/hdfs/server/datanode/FSDataset.java 
src/java/org/apache/hadoop/hdfs/server/datanode/FSDataset.java
- index 42ba15e..6383239 100644
- --- src/java/org/apache/hadoop/hdfs/server/datanode/FSDataset.java
- +++ src/java/org/apache/hadoop/hdfs/server/datanode/FSDataset.java
- @@ -355,12 +355,14 @@ public class FSDataset implements FSConstants, 
FSDatasetInterface {
-        return dfsUsage.getUsed();
-      }
- 
- +    /**
- +     * Calculate the capacity of the filesystem, after removing any
- +     * reserved capacity.
- +     * @return the unreserved number of bytes left in this filesystem. May 
be zero.
- +     */
-      long getCapacity() throws IOException {
- -      if (reserved > usage.getCapacity()) {
- -        return 0;
- -      }
- -
- -      return usage.getCapacity()-reserved;
- +      long remaining = usage.getCapacity() - reserved;
- +      return remaining > 0 ? remaining : 0;
-      }
- 
-      long getAvailable() throws IOException {
- 
- }}}
- It is essential that patches for JIRA issues are generated with the 
{{{--no-prefix}}} option. Without that an extra directory path is listed, and 
the patches can only be applied with a {{{patch -p1}}} call, ''which Hudson 
does not know to do''. If you want your patches to take, this is what you have 
to do. You can of course test this yourself by using a command like {{{patch 
-p0 << ../outgoing/HDFS-775.1}}} in a copy of the Git source tree to test that 
your patch takes.
- 
- === Updating your patch ===
- 
- If your patch is not immediately accepted, do not be offended: it happens to 
us all. It introduces a problem: your branches become out of date. You need to 
check out the latest apache version, merge your branches with it, and then push 
the changes back to github
- 
- {{{
-  git checkout trunk
-  git pull apache
-  git checkout mybranch
-  git merge trunk
-  git push github mybranch
- }}}
- 
- Your branch is up to date, and new diffs can be created and attached to 
patches. 
- 
- === Deriving Branches from Branches ===
- 
- If you have one patch that depends upon another, you should have a separate 
branch for each one. Simply merge the changes from the first branch into the 
second, so that it is always kept up to date with the first changes. To create 
a patch file for submission as a JIRA patch, do a diff between the two 
branches, not against trunk.
- 
- '''do not play with rebasing once you start doing this as you will make 
merging a nightmare'''
- 
- === What to do when your patch is committed ===
- 
- Once your patch is committed into Git, you do not need the branch any more. 
You can delete it straight away, but it is safer to verify the patch is 
completely merged in
- 
- Pull down the latest release and verify that the patch branch is synchronized
- 
- {{{
-  git checkout trunk
-  git pull apache
-  git checkout mybranch
-  git merge trunk
-  git diff trunk
- }}}
- 
- the output of the last command should be nothing: the two branches should be 
identical. You can then prove to git that this is true by switching back to the 
trunk branch and merging in the branch, an operation which will not change the 
source tree, but update Git's branch graph.
- 
- {{{
-  git checkout trunk
-  git merge mybranch
- }}}
- 
- Now you can delete the branch without being warned by git
- {{{
-  git branch -d mybranch
- }}}
- 
- Finally, propagate that deletion to your private github repository
- {{{
-  git push github :mybranch
- }}}
- 
- This odd syntax says "push nothing to github/mybranch".
- 

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[Hadoop Wiki] Update of "GitAndHadoop" by ArpitAgarwal

Reply via email to