Hi, I make my first pull request here(just a minor fix): https://github.com/apache/incubator-spark/pull/274
Another: Spark use assemble plug in to generate a fat jar during build, We should specify SPARK_JAR in the command line, then Spark upload spark jar and user jar to the HDFS, but yarn has a configurable option: <property> <name>yarn.application.classpath</name> </property> so we can do another way: Exclude hadoop-* jar and hadoop related jars during build, which can decreash the fat jar size. then put spark-fat-jar under HADOOP_LIB_DIR, then Spark don't need to update spark jar to the HDFS, also we don't need to specify SPARK_JAR in the command line. If there is no problem, I want to do this change.