> 1) Why 0.20.0 in the following command? Why not 0.20.1? 2) Why do I have to download all previous versions of Hadoop? Does Hive > need these? >
Since the Hadoop API changes from version to version Hive accesses it through a shim layer. The build is setup to compile shim libraries for the four most recent minor versions of Hadoop (0.17, 0.18, 0.19 and 0.20), and in order to do this it needs to have access to the Hadoop jars, which it gets by downloading the Hadoop release tarballs. In terms of the results of the build there is no difference between specifying 0.20.1 or 0.20.0 since the API stays the same between patch versions. However, you'll end up doing extra work if you specify 0.20.1 since the build script always downloads and builds shims against 0.20.0. There is no point in also telling it to download and build shims against 0.20.1 since the shims that are built against Hadoop 0.20.0 will work just as well against 0.20.1. Carl
