Re: How to broadcast a variable read from a file in yarn-cluster mode?

2015-02-12 Thread Tathagata Das
Can you give me the whole logs? TD On Tue, Feb 10, 2015 at 10:48 AM, Jon Gregg jonrgr...@gmail.com wrote: OK that worked and getting close here ... the job ran successfully for a bit and I got output for the first couple buckets before getting a java.lang.Exception: Could not compute split,

Re: How to broadcast a variable read from a file in yarn-cluster mode?

2015-02-10 Thread Sandy Ryza
Is the SparkContext you're using the same one that the StreamingContext wraps? If not, I don't think using two is supported. -Sandy On Tue, Feb 10, 2015 at 9:58 AM, Jon Gregg jonrgr...@gmail.com wrote: I'm still getting an error. Here's my code, which works successfully when tested using

Re: How to broadcast a variable read from a file in yarn-cluster mode?

2015-02-10 Thread Jon Gregg
I'm still getting an error. Here's my code, which works successfully when tested using spark-shell: val badIPs = sc.textFile(/user/sb/badfullIPs.csv).collect val badIpSet = badIPs.toSet val badIPsBC = sc.broadcast(badIpSet) The job looks OK from my end: 15/02/07 18:59:58

Re: How to broadcast a variable read from a file in yarn-cluster mode?

2015-02-10 Thread Jon Gregg
They're separate in my code, how can I combine them? Here's what I have: val sparkConf = new SparkConf() val ssc = new StreamingContext(sparkConf, Seconds(bucketSecs)) val sc = new SparkContext() On Tue, Feb 10, 2015 at 1:02 PM, Sandy Ryza sandy.r...@cloudera.com wrote: Is

Re: How to broadcast a variable read from a file in yarn-cluster mode?

2015-02-10 Thread Sandy Ryza
You should be able to replace that second line with val sc = ssc.sparkContext On Tue, Feb 10, 2015 at 10:04 AM, Jon Gregg jonrgr...@gmail.com wrote: They're separate in my code, how can I combine them? Here's what I have: val sparkConf = new SparkConf() val ssc = new

Re: How to broadcast a variable read from a file in yarn-cluster mode?

2015-02-10 Thread Jon Gregg
OK that worked and getting close here ... the job ran successfully for a bit and I got output for the first couple buckets before getting a java.lang.Exception: Could not compute split, block input-0-1423593163000 not found error. So I bumped up the memory at the command line from 2 gb to 5 gb,

Re: How to broadcast a variable read from a file in yarn-cluster mode?

2015-02-06 Thread Jon Gregg
OK I tried that, but how do I convert an RDD to a Set that I can then broadcast and cache? val badIPs = sc.textFile(hdfs:///user/jon/+ badfullIPs.csv) val badIPsLines = badIPs.getLines val badIpSet = badIPsLines.toSet val badIPsBC = sc.broadcast(badIpSet) produces the

Re: How to broadcast a variable read from a file in yarn-cluster mode?

2015-02-06 Thread Sandy Ryza
You can call collect() to pull in the contents of an RDD into the driver: val badIPsLines = badIPs.collect() On Fri, Feb 6, 2015 at 12:19 PM, Jon Gregg jonrgr...@gmail.com wrote: OK I tried that, but how do I convert an RDD to a Set that I can then broadcast and cache? val badIPs =

How to broadcast a variable read from a file in yarn-cluster mode?

2015-02-05 Thread YaoPau
I have a file badFullIPs.csv of bad IP addresses used for filtering. In yarn-client mode, I simply read it off the edge node, transform it, and then broadcast it: val badIPs = fromFile(edgeDir + badfullIPs.csv) val badIPsLines = badIPs.getLines val badIpSet = badIPsLines.toSet

Re: How to broadcast a variable read from a file in yarn-cluster mode?

2015-02-05 Thread Sandy Ryza
Hi Jon, You'll need to put the file on HDFS (or whatever distributed filesystem you're running on) and load it from there. -Sandy On Thu, Feb 5, 2015 at 3:18 PM, YaoPau jonrgr...@gmail.com wrote: I have a file badFullIPs.csv of bad IP addresses used for filtering. In yarn-client mode, I