Can you give me the whole logs?
TD
On Tue, Feb 10, 2015 at 10:48 AM, Jon Gregg jonrgr...@gmail.com wrote:
OK that worked and getting close here ... the job ran successfully for a
bit and I got output for the first couple buckets before getting a
java.lang.Exception: Could not compute split,
Is the SparkContext you're using the same one that the StreamingContext
wraps? If not, I don't think using two is supported.
-Sandy
On Tue, Feb 10, 2015 at 9:58 AM, Jon Gregg jonrgr...@gmail.com wrote:
I'm still getting an error. Here's my code, which works successfully when
tested using
I'm still getting an error. Here's my code, which works successfully when
tested using spark-shell:
val badIPs = sc.textFile(/user/sb/badfullIPs.csv).collect
val badIpSet = badIPs.toSet
val badIPsBC = sc.broadcast(badIpSet)
The job looks OK from my end:
15/02/07 18:59:58
They're separate in my code, how can I combine them? Here's what I have:
val sparkConf = new SparkConf()
val ssc = new StreamingContext(sparkConf, Seconds(bucketSecs))
val sc = new SparkContext()
On Tue, Feb 10, 2015 at 1:02 PM, Sandy Ryza sandy.r...@cloudera.com wrote:
Is
You should be able to replace that second line with
val sc = ssc.sparkContext
On Tue, Feb 10, 2015 at 10:04 AM, Jon Gregg jonrgr...@gmail.com wrote:
They're separate in my code, how can I combine them? Here's what I have:
val sparkConf = new SparkConf()
val ssc = new
OK that worked and getting close here ... the job ran successfully for a
bit and I got output for the first couple buckets before getting a
java.lang.Exception: Could not compute split, block input-0-1423593163000
not found error.
So I bumped up the memory at the command line from 2 gb to 5 gb,
OK I tried that, but how do I convert an RDD to a Set that I can then
broadcast and cache?
val badIPs = sc.textFile(hdfs:///user/jon/+ badfullIPs.csv)
val badIPsLines = badIPs.getLines
val badIpSet = badIPsLines.toSet
val badIPsBC = sc.broadcast(badIpSet)
produces the
You can call collect() to pull in the contents of an RDD into the driver:
val badIPsLines = badIPs.collect()
On Fri, Feb 6, 2015 at 12:19 PM, Jon Gregg jonrgr...@gmail.com wrote:
OK I tried that, but how do I convert an RDD to a Set that I can then
broadcast and cache?
val badIPs =
I have a file badFullIPs.csv of bad IP addresses used for filtering. In
yarn-client mode, I simply read it off the edge node, transform it, and then
broadcast it:
val badIPs = fromFile(edgeDir + badfullIPs.csv)
val badIPsLines = badIPs.getLines
val badIpSet = badIPsLines.toSet
Hi Jon,
You'll need to put the file on HDFS (or whatever distributed filesystem
you're running on) and load it from there.
-Sandy
On Thu, Feb 5, 2015 at 3:18 PM, YaoPau jonrgr...@gmail.com wrote:
I have a file badFullIPs.csv of bad IP addresses used for filtering. In
yarn-client mode, I
10 matches
Mail list logo