I did see that - I was wondering if anyone would try to convert that into TinkerPop documentation of some sort. I'll save my less positive comments for the end and first just say what you could do if everyone is into this idea. You could add it to the "Implementation Recipes" subsection of the "Recipes" document.
> - include the spark-yarn dependency to spark-gremlin I could be wrong, but I don't think you need to add that as a direct dependency. If we don't need it for compilation it probably shouldn't be in the pom.xml. If you just need extra jars to come with the plugin to the console when you do: :install org.apache.tinkerpop spark-gremlin 3.2.5 you can just add a manifest entry to spark-gremlin to suck in additional jars as part of that. Note that we already do this with spark-gremlin - see: https://github.com/apache/tinkerpop/blob/0d532aa91e0c9bc775c36d9572f5f816d323abb6/spark-gremlin/pom.xml#L406 dependencies are semi-colon separated, so you can just add more after that entry. As for: > do you see potential obstacles in accepting a PR along these lines? Are there any other dependencies to add? Like, the blog post says you tested on Hortonworks Data Platform sandbox - do we need that in the mix too? ....and here's where i get sorta cringy as I alluded to at the start of this......the only problem i'm concerned about is the one you posted: > the recipe would be maintained and still work after version upgrades that terrifies me. personally speaking, i'm terribly uninterested in hunting down spark to the yarn to hadoop to the hortonworks to the cloudera to the map-red-env.sh to the yarn-site.xml type of errors. it's not a nice place at all. If that integration starts to fail for some reason our docs will effectively be broken and someone is going to have to go down into that ungodly hole of demons to unblock us and i'm scared of the dark. on the flip side, i'm sensitive to users struggling with yarn stuff and every time i see you solve a problem like that on the mailing list related to that, i'm like "All hail the the Tamer of Hadoop! Long live HadoopMarc!" - so it seems like this is a need to some degree so it would be nice if we could make it work somehow. Anyway - those are my thoughts on the matter. Let's see what other people have to say. On Thu, Jul 6, 2017 at 5:02 AM, Marc de Lignie <[email protected]> wrote: > Hi Stephen, > > I recently posted recipes on the gremlin and janusgraph users lists to > configure the binary distributions to work with a spark-yarn cluster. I > think it would be useful to have the tinkerpop recipe included in Apache > Tinkerpop repo itself in the following way: > > - include the spark-yarn dependency to spark-gremlin > > - add the recipe to the docs so that it is actually run in the existing > documentation environment at build time > > In this way: > > - the recipe would be less clumsy for users to follow (no external deps) > > - the recipe would be maintained and still work after version upgrades > > I do not have to remind you that many users have had problems with > spark-yarn and that the ability to run OLAP queries on an existing cluster > is one of the attractive feature of Tinkerpop. > > This brings me to the question: do you see potential obstacles in > accepting a PR along these lines? I will probably wait for some time until > actually doing this, though, to have more opportunity to "eat my own > dogfood" and see if changes are still required. > > Cheers, HadoopMarc > > >
