[ 
https://issues.apache.org/jira/browse/TINKERPOP-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16205115#comment-16205115
 ] 

ASF GitHub Bot commented on TINKERPOP-1786:
-------------------------------------------

Github user pluradj commented on a diff in the pull request:

    https://github.com/apache/tinkerpop/pull/721#discussion_r144719566
  
    --- Diff: docs/src/recipes/olap-spark-yarn.asciidoc ---
    @@ -0,0 +1,153 @@
    +////
    +Licensed to the Apache Software Foundation (ASF) under one or more
    +contributor license agreements.  See the NOTICE file distributed with
    +this work for additional information regarding copyright ownership.
    +The ASF licenses this file to You under the Apache License, Version 2.0
    +(the "License"); you may not use this file except in compliance with
    +the License.  You may obtain a copy of the License at
    +
    +  http://www.apache.org/licenses/LICENSE-2.0
    +
    +Unless required by applicable law or agreed to in writing, software
    +distributed under the License is distributed on an "AS IS" BASIS,
    +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +See the License for the specific language governing permissions and
    +limitations under the License.
    +////
    +[[olap-spark-yarn]]
    +OLAP traversals with Spark on Yarn
    +----------------------------------
    +
    +TinkerPop's combination of 
http://tinkerpop.apache.org/docs/current/reference/#sparkgraphcomputer[SparkGraphComputer]
    +and 
http://tinkerpop.apache.org/docs/current/reference/#_properties_files[HadoopGraph]
 allows for running
    +distributed, analytical graph queries (OLAP) on a computer cluster. The
    
+http://tinkerpop.apache.org/docs/current/reference/#sparkgraphcomputer[reference
 documentation] covers the cases
    +where Spark runs locally or where the cluster is managed by a Spark 
server. However, many users can only run OLAP jobs
    +via the http://hadoop.apache.org/[Hadoop 2.x] Resource Manager (Yarn), 
which requires `SparkGraphComputer` to be
    +configured differently. This recipe describes this configuration.
    +
    +Approach
    +~~~~~~~~
    +
    +Most configuration problems of TinkerPop with Spark on Yarn stem from 
three reasons:
    +
    +1. `SparkGraphComputer` creates its own `SparkContext` so it does not get 
any configs from the usual `spark-submit` command.
    +2. The TinkerPop Spark plugin did not include Spark on Yarn runtime 
dependencies until version 3.2.7/3.3.1.
    +3. Resolving reason 2 by adding the cluster's `spark-assembly` jar to the 
classpath creates a host of version
    +conflicts, because Spark 1.x dependency versions have remained frozen 
since 2014.
    +
    +The current recipe follows a minimalist approach in which no dependencies 
are added to the dependencies
    +included in the TinkerPop binary distribution. The Hadoop cluster's Spark 
installation is completely ignored. This
    +approach minimizes the chance of dependency version conflicts.
    +
    +Prerequisites
    +~~~~~~~~~~~~~
    +This recipe is suitable for both a real external and a local pseudo Hadoop 
cluster. While the recipe is maintained
    +for the vanilla Hadoop pseudo-cluster, it has been reported to work on 
real clusters with Hadoop distributions
    +from various vendors.
    +
    +If you want to try the recipe on a local Hadoop pseudo-cluster, the 
easiest way to install
    +it is to look at the install script at 
https://github.com/apache/tinkerpop/blob/x.y.z/docker/hadoop/install.sh
    +and the `start hadoop` section of 
https://github.com/apache/tinkerpop/blob/x.y.z/docker/scripts/build.sh.
    +
    +This recipe assumes that you installed the gremlin console with the
    +http://tinkerpop.apache.org/docs/x.y.z/reference/#spark-plugin[spark 
plugin] (the
    +http://tinkerpop.apache.org/docs/x.y.z/reference/#hadoop-plugin[hadoop 
plugin] is optional). Your Hadoop cluster
    +may have been configured to use file compression, e.g. lzo compression. If 
so, you need to copy the relevant
    --- End diff --
    
    capitalize LZO


> Recipe and missing manifest items for Spark on Yarn
> ---------------------------------------------------
>
>                 Key: TINKERPOP-1786
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1786
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: hadoop
>    Affects Versions: 3.3.0, 3.1.8, 3.2.6
>         Environment: gremlin-console
>            Reporter: Marc de Lignie
>            Priority: Minor
>             Fix For: 3.2.7, 3.3.1
>
>
> Thorough documentation for running OLAP queries on Spark on Yarn has been 
> missing, keeping some users from getting the benefits of this nice feature of 
> the Tinkerpop stack and resulting in a significant number of questions on the 
> gremlin users list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to