Shortest path performance in Graphx with Spark

2017-01-10 Thread Gerard Casey
Hello everyone, I am creating a graph from a `gz` compressed `json` file of `edge` and `vertices` type. I have put the files in a dropbox folder [here][1] I load and map these `json` records to create the `vertices` and `edge` types required by `graphx` like this: val vertices_raw =

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-08 Thread Gerard Casey
Sure - I wanted to check with admin before sharing. I’ve attached it now, does this help? Many thanks again, G Container: container_e34_1479877553404_0174_01_03 on hdp-node12.xcat.cluster_45454_1481228528201

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-08 Thread Gerard Casey
Right. I’m confident that is setup correctly. I can run the SparkPi test script. The main difference between it and my application is that it doesn’t access HDFS. > On 8 Dec 2016, at 18:43, Marcelo Vanzin <van...@cloudera.com> wrote: > > On Wed, Dec 7, 2016 at 11:54 P

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Gerard Casey
you may need to do ticket renewal. Spark will handle it then. I may be wrong > though. > > I guess it gets even more complicated if you need to access other secured > service from Spark like hbase or Phoenix, but i guess this is for another > discussion. > > Regards, > Marc

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Gerard Casey
point in the startup of your code but any HDFS access will require a TOKEN or KERBEROS ticket. Cheers, Wilfred > On 8 Dec 2016, at 08:35, Gerard Casey <gerardhughca...@gmail.com> wrote: > > Thanks Marcelo. > > I’ve completely removed it. Ok - even if I read/write from HDFS? > &

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Gerard Casey
uld not be setting those principal / keytab configs. > > Literally all you have to do is login with kinit then run spark-submit. > > Try with the SparkPi example for instance, instead of your own code. > If that doesn't work, you have a configuration issue somewhere. > > On We

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Gerard Casey
Thanks. I’ve checked the TGT, principal and key tab. Where to next?! > On 7 Dec 2016, at 22:03, Marcelo Vanzin <van...@cloudera.com> wrote: > > On Wed, Dec 7, 2016 at 12:15 PM, Gerard Casey <gerardhughca...@gmail.com> > wrote: >> Can anyone point me to a tu

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Gerard Casey
19:45, Marcelo Vanzin <van...@cloudera.com> wrote: > > That's not the error, that's just telling you the application failed. > You have to look at the YARN logs for application_1479877553404_0041 > to see why it failed. > > On Mon, Dec 5, 2016 at 10:44 AM, Gerard Casey <gerardhu

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-05 Thread Gerard Casey
12/05 18:24:18 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-2e566133-d50a-4904-920e-ab5cec07c644 On Mon, Dec 5, 2016 at 10:30 AM, Gerard Casey <gerardhughca...@gmail.com> wrote: > >> On 5 Dec 2016, at 19:26, Marcelo Vanzin <van...@cloudera.com> wrote: >> &g

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-05 Thread Gerard Casey
cutor-memory 13G --total-executor-cores 32 target/scala-2.10/graphx_sp_2.10-1.0.jar However, the error persists Any ideas? Thanks Geroid > On 5 Dec 2016, at 13:35, Gerard Casey <gerardhughca...@gmail.com> wrote: > > Hello all, > > I am using Spark with Kerberos authent

Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-05 Thread Gerard Casey
Hello all, I am using Spark with Kerberos authentication. I can run my code using `spark-shell` fine and I can also use `spark-submit` in local mode (e.g. —master local[16]). Both function as expected. local mode - spark-submit --class "graphx_sp" --master local[16] --driver-memory

RDD to HDFS - Kerberos - authentication error - RetryInvocationHandler

2016-11-11 Thread Gerard Casey
Hi all, I have an RDD that I wish to write to HDFS. data.saveAsTextFile("hdfs://path/vertices") This returns: WARN RetryInvocationHandler: Exception while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over null. Not retrying because try once and fail.

GraphX and Public Transport Shortest Paths

2016-11-08 Thread Gerard Casey
Hi all, I’m doing a quick lit review. Consider I have a graph that has link weights dependent on time. I.e., a bus on this road gives a journey time (link weight) of x at time y. This is a classic public transport shortest path problem. This is a weighted directed graph that is time

GraphX VerticesRDD issue - java.lang.ArrayStoreException: java.lang.Long

2016-08-18 Thread Gerard Casey
Dear all, I am building a graph from two JSON files. Spark version 1.6.1 Creating Edge and Vertex RDDs from JSON files. The vertex JSON files looks like this: {"toid": "osgb400031043205", "index": 1, "point": [508180.748, 195333.973]} {"toid": "osgb400031043206",