Patrick Wendell created SPARK-11305:
---------------------------------------

             Summary: Remove Third-Party Hadoop Distributions Doc Page
                 Key: SPARK-11305
                 URL: https://issues.apache.org/jira/browse/SPARK-11305
             Project: Spark
          Issue Type: Improvement
          Components: Documentation
            Reporter: Patrick Wendell
            Priority: Critical


There is a fairly old page in our docs that contains a bunch of assorted 
information regarding running Spark on Hadoop clusters. I think this page 
should be removed and merged into other parts of the docs because the 
information is largely redundant and somewhat outdated.

http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html

There are three sections:

1. Compile time Hadoop version - this information I think can be removed in 
favor of that on the "building spark" page. These days most "advanced users" 
are building without bundling Hadoop, so I'm not sure giving them a bunch of 
different Hadoop versions sends the right message.

2. Linking against Hadoop - this doesn't seem to add much beyond what is in the 
programming guide.

3. Where to run Spark - redundant with the hardware provisioning guide.

4. Inheriting cluster configurations - I think this would be better as a 
section at the end of the configuration page. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to