Re: SparkContext UI

2014-10-31 Thread Sean Owen
No, empty parens do no matter when calling no-arg methods in Scala.
This invocation should work as-is and should result in the RDD showing
in Storage. I see that when I run it right now.

Since it really does/should work, I'd look at other possibilities --
is it maybe taking a short time to start caching? looking at a
different/old Storage tab?

On Fri, Oct 31, 2014 at 1:17 AM, Sameer Farooqui same...@databricks.com wrote:
 Hi Stuart,

 You're close!

 Just add a () after the cache, like: data.cache()

 ...and then run the .count() action on it and you should be good to see it
 in the Storage UI!


 - Sameer

 On Thu, Oct 30, 2014 at 4:50 PM, Stuart Horsman stuart.hors...@gmail.com
 wrote:

 Sorry too quick to pull the trigger on my original email.  I should have
 added that I'm tried using persist() and cache() but no joy.

 I'm doing this:

 data = sc.textFile(somedata)

 data.cache

 data.count()

 but I still can't see anything in the storage?



 On 31 October 2014 10:42, Sameer Farooqui same...@databricks.com wrote:

 Hey Stuart,

 The RDD won't show up under the Storage tab in the UI until it's been
 cached. Basically Spark doesn't know what the RDD will look like until it's
 cached, b/c up until then the RDD is just on disk (external to Spark). If
 you launch some transformations + an action on an RDD that is purely on
 disk, then Spark will read it from disk, compute against it and then write
 the results back to disk or show you the results at the scala/python shells.
 But when you run Spark workloads against purely on disk files, the RDD won't
 show up in Spark's Storage UI. Hope that makes sense...

 - Sameer

 On Thu, Oct 30, 2014 at 4:30 PM, Stuart Horsman
 stuart.hors...@gmail.com wrote:

 Hi All,

 When I load an RDD with:

 data = sc.textFile(somefile)

 I don't see the resulting RDD in the SparkContext gui on localhost:4040
 in /storage.

 Is there something special I need to do to allow me to view this?  I
 tried but scala and python shells but same result.

 Thanks

 Stuart





-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: SparkContext UI

2014-10-31 Thread Stuart Horsman
Hi Sean/Sameer,

It seems you're both right.  In the python shell I need to explicitly call
the empty parens data.cache(), then run an action and it appears in the
storage tab.  Using the scala shell I can just call data.cache without the
parens, run an action tthat works.

Thanks for your help.

Stu

On 31 October 2014 19:19, Sean Owen so...@cloudera.com wrote:

 No, empty parens do no matter when calling no-arg methods in Scala.
 This invocation should work as-is and should result in the RDD showing
 in Storage. I see that when I run it right now.

 Since it really does/should work, I'd look at other possibilities --
 is it maybe taking a short time to start caching? looking at a
 different/old Storage tab?

 On Fri, Oct 31, 2014 at 1:17 AM, Sameer Farooqui same...@databricks.com
 wrote:
  Hi Stuart,
 
  You're close!
 
  Just add a () after the cache, like: data.cache()
 
  ...and then run the .count() action on it and you should be good to see
 it
  in the Storage UI!
 
 
  - Sameer
 
  On Thu, Oct 30, 2014 at 4:50 PM, Stuart Horsman 
 stuart.hors...@gmail.com
  wrote:
 
  Sorry too quick to pull the trigger on my original email.  I should have
  added that I'm tried using persist() and cache() but no joy.
 
  I'm doing this:
 
  data = sc.textFile(somedata)
 
  data.cache
 
  data.count()
 
  but I still can't see anything in the storage?
 
 
 
  On 31 October 2014 10:42, Sameer Farooqui same...@databricks.com
 wrote:
 
  Hey Stuart,
 
  The RDD won't show up under the Storage tab in the UI until it's been
  cached. Basically Spark doesn't know what the RDD will look like until
 it's
  cached, b/c up until then the RDD is just on disk (external to Spark).
 If
  you launch some transformations + an action on an RDD that is purely on
  disk, then Spark will read it from disk, compute against it and then
 write
  the results back to disk or show you the results at the scala/python
 shells.
  But when you run Spark workloads against purely on disk files, the RDD
 won't
  show up in Spark's Storage UI. Hope that makes sense...
 
  - Sameer
 
  On Thu, Oct 30, 2014 at 4:30 PM, Stuart Horsman
  stuart.hors...@gmail.com wrote:
 
  Hi All,
 
  When I load an RDD with:
 
  data = sc.textFile(somefile)
 
  I don't see the resulting RDD in the SparkContext gui on
 localhost:4040
  in /storage.
 
  Is there something special I need to do to allow me to view this?  I
  tried but scala and python shells but same result.
 
  Thanks
 
  Stuart
 
 
 
 



SparkContext UI

2014-10-30 Thread Stuart Horsman
Hi All,

When I load an RDD with:

data = sc.textFile(somefile)

I don't see the resulting RDD in the SparkContext gui on localhost:4040 in
/storage.

Is there something special I need to do to allow me to view this?  I tried
but scala and python shells but same result.

Thanks

Stuart


Re: SparkContext UI

2014-10-30 Thread Sameer Farooqui
Hey Stuart,

The RDD won't show up under the Storage tab in the UI until it's been
cached. Basically Spark doesn't know what the RDD will look like until it's
cached, b/c up until then the RDD is just on disk (external to Spark). If
you launch some transformations + an action on an RDD that is purely on
disk, then Spark will read it from disk, compute against it and then write
the results back to disk or show you the results at the scala/python
shells. But when you run Spark workloads against purely on disk files, the
RDD won't show up in Spark's Storage UI. Hope that makes sense...

- Sameer

On Thu, Oct 30, 2014 at 4:30 PM, Stuart Horsman stuart.hors...@gmail.com
wrote:

 Hi All,

 When I load an RDD with:

 data = sc.textFile(somefile)

 I don't see the resulting RDD in the SparkContext gui on localhost:4040 in
 /storage.

 Is there something special I need to do to allow me to view this?  I tried
 but scala and python shells but same result.

 Thanks

 Stuart



Re: SparkContext UI

2014-10-30 Thread Stuart Horsman
Sorry too quick to pull the trigger on my original email.  I should have
added that I'm tried using persist() and cache() but no joy.

I'm doing this:

data = sc.textFile(somedata)

data.cache

data.count()

but I still can't see anything in the storage?



On 31 October 2014 10:42, Sameer Farooqui same...@databricks.com wrote:

 Hey Stuart,

 The RDD won't show up under the Storage tab in the UI until it's been
 cached. Basically Spark doesn't know what the RDD will look like until it's
 cached, b/c up until then the RDD is just on disk (external to Spark). If
 you launch some transformations + an action on an RDD that is purely on
 disk, then Spark will read it from disk, compute against it and then write
 the results back to disk or show you the results at the scala/python
 shells. But when you run Spark workloads against purely on disk files, the
 RDD won't show up in Spark's Storage UI. Hope that makes sense...

 - Sameer

 On Thu, Oct 30, 2014 at 4:30 PM, Stuart Horsman stuart.hors...@gmail.com
 wrote:

 Hi All,

 When I load an RDD with:

 data = sc.textFile(somefile)

 I don't see the resulting RDD in the SparkContext gui on localhost:4040
 in /storage.

 Is there something special I need to do to allow me to view this?  I
 tried but scala and python shells but same result.

 Thanks

 Stuart





Re: SparkContext UI

2014-10-30 Thread Sameer Farooqui
Hi Stuart,

You're close!

Just add a () after the cache, like: data.cache()

...and then run the .count() action on it and you should be good to see it
in the Storage UI!


- Sameer

On Thu, Oct 30, 2014 at 4:50 PM, Stuart Horsman stuart.hors...@gmail.com
wrote:

 Sorry too quick to pull the trigger on my original email.  I should have
 added that I'm tried using persist() and cache() but no joy.

 I'm doing this:

 data = sc.textFile(somedata)

 data.cache

 data.count()

 but I still can't see anything in the storage?



 On 31 October 2014 10:42, Sameer Farooqui same...@databricks.com wrote:

 Hey Stuart,

 The RDD won't show up under the Storage tab in the UI until it's been
 cached. Basically Spark doesn't know what the RDD will look like until it's
 cached, b/c up until then the RDD is just on disk (external to Spark). If
 you launch some transformations + an action on an RDD that is purely on
 disk, then Spark will read it from disk, compute against it and then write
 the results back to disk or show you the results at the scala/python
 shells. But when you run Spark workloads against purely on disk files, the
 RDD won't show up in Spark's Storage UI. Hope that makes sense...

 - Sameer

 On Thu, Oct 30, 2014 at 4:30 PM, Stuart Horsman stuart.hors...@gmail.com
  wrote:

 Hi All,

 When I load an RDD with:

 data = sc.textFile(somefile)

 I don't see the resulting RDD in the SparkContext gui on localhost:4040
 in /storage.

 Is there something special I need to do to allow me to view this?  I
 tried but scala and python shells but same result.

 Thanks

 Stuart