Re: How can I pass a Data Frame from object to another class

2016-03-06 Thread Mich Talebzadeh
It would be interesting why these contexts are not available in the JVM
outside of the class they were instigated (created).

For example  we could initialize an application with two threads as follows
in the main method

  val conf = new SparkConf().
   setAppName("Harness4").
   setMaster("local[12]").
   set("spark.driver.allowMultipleContexts", "true")
  val sc = new SparkContext(conf)

So any following class should see "sc" correct. However I will have to pass
it as parameter to the method of that following class!

class FirstQuery {
   def firstquerym(*sc: org.apache.spark.SparkContext*, rs:
org.apache.spark.sql.DataFrame) {
   val sqlContext = SQLContext.getOrCreate(sc)
...
  }
}

Otherwise it throws an error that "sc" does not exit

Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 6 March 2016 at 11:48, Gourav Sengupta  wrote:

> Hi Ted/ Holden,
>
> I had read a section in the book Learning Spark which advised against
> passing entire objects to SPARK instead of just functions (ref: page 30
> passing functions to SPARK).
>
> Is the above way of solving problem not going against it? It will be
> exciting to see your kind explanation.
>
>
> Regards,
> Gourav Sengupta
>
> On Sun, Mar 6, 2016 at 10:57 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Thanks for this tip
>>
>> The way I do it is to pass SparckContext "sc" to method
>> firstquery.firstquerym by calling the following
>>
>> val firstquery =  new FirstQuery
>> firstquery.firstquerym(sc, rs)
>>
>>
>> And creating the method as follows:
>>
>> class FirstQuery {
>>def firstquerym(sc: org.apache.spark.SparkContext, rs:
>> org.apache.spark.sql.DataFrame) {
>>val sqlContext = SQLContext.getOrCreate(sc)
>>println ("\nfirst query at"); sqlContext.sql("SELECT
>> FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
>> ").collect.foreach(println)
>>   val rs1 =
>> rs.orderBy("calendar_month_desc","channel_desc").take(5).foreach(println)
>>   }
>> }
>>
>> This works. However, I don't seem to invoke getOrCreate without passing
>> sc?
>>
>> Is this the way you are implying. Also why "sc" is not available within
>> the life of JVM please
>>
>> Thanks
>>
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 6 March 2016 at 01:25, Ted Yu  wrote:
>>
>>> Looking at the methods you call on HiveContext, they seem to belong
>>> to SQLContext.
>>>
>>> For SQLContext, you can use the below method of SQLContext in FirstQuery
>>> to retrieve SQLContext:
>>>
>>>   def getOrCreate(sparkContext: SparkContext): SQLContext = {
>>>
>>> FYI
>>>
>>> On Sat, Mar 5, 2016 at 3:37 PM, Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 I managed to sort this one out.

 The class should be defined as below with its method accepting two
 input parameters for HiveContext and rs as below

 class FirstQuery {
def firstquerym(HiveContext: org.apache.spark.sql.hive.HiveContext,
 rs: org.apache.spark.sql.DataFrame) {
println ("\nfirst query at"); HiveContext.sql("SELECT
 FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
 ").collect.foreach(println)
   val rs1 =
 rs.orderBy("calendar_month_desc","channel_desc").take(5).foreach(println)
   }
 }

 and called from the main method as follows:

 val firstquery =  new FirstQuery
 firstquery.firstquerym(HiveContext, rs)


 Thanks


 Dr Mich Talebzadeh



 LinkedIn * 
 https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 *



 http://talebzadehmich.wordpress.com



 On 5 March 2016 at 20:56, Mich Talebzadeh 
 wrote:

> Hi,
>
> I can use sbt to compile and run the following code. It works without
> any problem.
>
> I want to divide this into the obj and another class. I would like to
> do the result set joining tables identified by Data Frame 'rs' and then
> calls the method "firstquerym" in the class FirstQuery to do the
> calculation identified as "rs1"
>
> Now it needs "rs" to be available in class FrstQuery. Two questions
> please
>
>
>1. How can I pass rs to class FirstQuery
>2. Is there a better way of modularising this work so I can use
>methods defined in another class to 

Re: How can I pass a Data Frame from object to another class

2016-03-06 Thread Gourav Sengupta
Hi Ted/ Holden,

I had read a section in the book Learning Spark which advised against
passing entire objects to SPARK instead of just functions (ref: page 30
passing functions to SPARK).

Is the above way of solving problem not going against it? It will be
exciting to see your kind explanation.


Regards,
Gourav Sengupta

On Sun, Mar 6, 2016 at 10:57 AM, Mich Talebzadeh 
wrote:

> Thanks for this tip
>
> The way I do it is to pass SparckContext "sc" to method
> firstquery.firstquerym by calling the following
>
> val firstquery =  new FirstQuery
> firstquery.firstquerym(sc, rs)
>
>
> And creating the method as follows:
>
> class FirstQuery {
>def firstquerym(sc: org.apache.spark.SparkContext, rs:
> org.apache.spark.sql.DataFrame) {
>val sqlContext = SQLContext.getOrCreate(sc)
>println ("\nfirst query at"); sqlContext.sql("SELECT
> FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
> ").collect.foreach(println)
>   val rs1 =
> rs.orderBy("calendar_month_desc","channel_desc").take(5).foreach(println)
>   }
> }
>
> This works. However, I don't seem to invoke getOrCreate without passing sc?
>
> Is this the way you are implying. Also why "sc" is not available within
> the life of JVM please
>
> Thanks
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 6 March 2016 at 01:25, Ted Yu  wrote:
>
>> Looking at the methods you call on HiveContext, they seem to belong
>> to SQLContext.
>>
>> For SQLContext, you can use the below method of SQLContext in FirstQuery
>> to retrieve SQLContext:
>>
>>   def getOrCreate(sparkContext: SparkContext): SQLContext = {
>>
>> FYI
>>
>> On Sat, Mar 5, 2016 at 3:37 PM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> I managed to sort this one out.
>>>
>>> The class should be defined as below with its method accepting two input
>>> parameters for HiveContext and rs as below
>>>
>>> class FirstQuery {
>>>def firstquerym(HiveContext: org.apache.spark.sql.hive.HiveContext,
>>> rs: org.apache.spark.sql.DataFrame) {
>>>println ("\nfirst query at"); HiveContext.sql("SELECT
>>> FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
>>> ").collect.foreach(println)
>>>   val rs1 =
>>> rs.orderBy("calendar_month_desc","channel_desc").take(5).foreach(println)
>>>   }
>>> }
>>>
>>> and called from the main method as follows:
>>>
>>> val firstquery =  new FirstQuery
>>> firstquery.firstquerym(HiveContext, rs)
>>>
>>>
>>> Thanks
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 5 March 2016 at 20:56, Mich Talebzadeh 
>>> wrote:
>>>
 Hi,

 I can use sbt to compile and run the following code. It works without
 any problem.

 I want to divide this into the obj and another class. I would like to
 do the result set joining tables identified by Data Frame 'rs' and then
 calls the method "firstquerym" in the class FirstQuery to do the
 calculation identified as "rs1"

 Now it needs "rs" to be available in class FrstQuery. Two questions
 please


1. How can I pass rs to class FirstQuery
2. Is there a better way of modularising this work so I can use
methods defined in another class to be called in main method

 Thanks

 import org.apache.spark.SparkContext
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.Row
 import org.apache.spark.sql.hive.HiveContext
 import org.apache.spark.sql.types._
 import org.apache.spark.sql.SQLContext
 import org.apache.spark.sql.functions._
 //
 object Harness4 {
   def main(args: Array[String]) {
   val conf = new
 SparkConf().setAppName("Harness4").setMaster("local[*]").set("spark.driver.allowMultipleContexts",
 "true")
   val sc = new SparkContext(conf)
   // Note that this should be done only after an instance of
 org.apache.spark.sql.SQLContext is created. It should be written as:
   val sqlContext= new org.apache.spark.sql.SQLContext(sc)
   import sqlContext.implicits._
   val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
 println ("\nStarted at"); HiveContext.sql("SELECT
 FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
 ").collect.foreach(println)
 HiveContext.sql("use oraclehadoop")
 var s =
 HiveContext.table("sales").select("AMOUNT_SOLD","TIME_ID","CHANNEL_ID")
 val c =
 HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC")

Re: How can I pass a Data Frame from object to another class

2016-03-06 Thread Mich Talebzadeh
Thanks for this tip

The way I do it is to pass SparckContext "sc" to method
firstquery.firstquerym by calling the following

val firstquery =  new FirstQuery
firstquery.firstquerym(sc, rs)


And creating the method as follows:

class FirstQuery {
   def firstquerym(sc: org.apache.spark.SparkContext, rs:
org.apache.spark.sql.DataFrame) {
   val sqlContext = SQLContext.getOrCreate(sc)
   println ("\nfirst query at"); sqlContext.sql("SELECT
FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
").collect.foreach(println)
  val rs1 =
rs.orderBy("calendar_month_desc","channel_desc").take(5).foreach(println)
  }
}

This works. However, I don't seem to invoke getOrCreate without passing sc?

Is this the way you are implying. Also why "sc" is not available within the
life of JVM please

Thanks




Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 6 March 2016 at 01:25, Ted Yu  wrote:

> Looking at the methods you call on HiveContext, they seem to belong
> to SQLContext.
>
> For SQLContext, you can use the below method of SQLContext in FirstQuery
> to retrieve SQLContext:
>
>   def getOrCreate(sparkContext: SparkContext): SQLContext = {
>
> FYI
>
> On Sat, Mar 5, 2016 at 3:37 PM, Mich Talebzadeh  > wrote:
>
>> I managed to sort this one out.
>>
>> The class should be defined as below with its method accepting two input
>> parameters for HiveContext and rs as below
>>
>> class FirstQuery {
>>def firstquerym(HiveContext: org.apache.spark.sql.hive.HiveContext,
>> rs: org.apache.spark.sql.DataFrame) {
>>println ("\nfirst query at"); HiveContext.sql("SELECT
>> FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
>> ").collect.foreach(println)
>>   val rs1 =
>> rs.orderBy("calendar_month_desc","channel_desc").take(5).foreach(println)
>>   }
>> }
>>
>> and called from the main method as follows:
>>
>> val firstquery =  new FirstQuery
>> firstquery.firstquerym(HiveContext, rs)
>>
>>
>> Thanks
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 5 March 2016 at 20:56, Mich Talebzadeh 
>> wrote:
>>
>>> Hi,
>>>
>>> I can use sbt to compile and run the following code. It works without
>>> any problem.
>>>
>>> I want to divide this into the obj and another class. I would like to do
>>> the result set joining tables identified by Data Frame 'rs' and then calls
>>> the method "firstquerym" in the class FirstQuery to do the calculation
>>> identified as "rs1"
>>>
>>> Now it needs "rs" to be available in class FrstQuery. Two questions
>>> please
>>>
>>>
>>>1. How can I pass rs to class FirstQuery
>>>2. Is there a better way of modularising this work so I can use
>>>methods defined in another class to be called in main method
>>>
>>> Thanks
>>>
>>> import org.apache.spark.SparkContext
>>> import org.apache.spark.SparkConf
>>> import org.apache.spark.sql.Row
>>> import org.apache.spark.sql.hive.HiveContext
>>> import org.apache.spark.sql.types._
>>> import org.apache.spark.sql.SQLContext
>>> import org.apache.spark.sql.functions._
>>> //
>>> object Harness4 {
>>>   def main(args: Array[String]) {
>>>   val conf = new
>>> SparkConf().setAppName("Harness4").setMaster("local[*]").set("spark.driver.allowMultipleContexts",
>>> "true")
>>>   val sc = new SparkContext(conf)
>>>   // Note that this should be done only after an instance of
>>> org.apache.spark.sql.SQLContext is created. It should be written as:
>>>   val sqlContext= new org.apache.spark.sql.SQLContext(sc)
>>>   import sqlContext.implicits._
>>>   val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>>> println ("\nStarted at"); HiveContext.sql("SELECT
>>> FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
>>> ").collect.foreach(println)
>>> HiveContext.sql("use oraclehadoop")
>>> var s =
>>> HiveContext.table("sales").select("AMOUNT_SOLD","TIME_ID","CHANNEL_ID")
>>> val c = HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC")
>>> val t =
>>> HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC")
>>> println ("\ncreating data set at"); HiveContext.sql("SELECT
>>> FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
>>> ").collect.foreach(println)
>>> val rs =
>>> s.join(t,"time_id").join(c,"channel_id").groupBy("calendar_month_desc","channel_desc").agg(sum("amount_sold").as("TotalSales"))
>>> //println ("\nfirst query at"); HiveContext.sql("SELECT
>>> FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
>>> ").collect.foreach(println)
>>> //val rs1 =
>>> 

Re: How can I pass a Data Frame from object to another class

2016-03-05 Thread Ted Yu
Looking at the methods you call on HiveContext, they seem to belong
to SQLContext.

For SQLContext, you can use the below method of SQLContext in FirstQuery to
retrieve SQLContext:

  def getOrCreate(sparkContext: SparkContext): SQLContext = {

FYI

On Sat, Mar 5, 2016 at 3:37 PM, Mich Talebzadeh 
wrote:

> I managed to sort this one out.
>
> The class should be defined as below with its method accepting two input
> parameters for HiveContext and rs as below
>
> class FirstQuery {
>def firstquerym(HiveContext: org.apache.spark.sql.hive.HiveContext,
> rs: org.apache.spark.sql.DataFrame) {
>println ("\nfirst query at"); HiveContext.sql("SELECT
> FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
> ").collect.foreach(println)
>   val rs1 =
> rs.orderBy("calendar_month_desc","channel_desc").take(5).foreach(println)
>   }
> }
>
> and called from the main method as follows:
>
> val firstquery =  new FirstQuery
> firstquery.firstquerym(HiveContext, rs)
>
>
> Thanks
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 5 March 2016 at 20:56, Mich Talebzadeh 
> wrote:
>
>> Hi,
>>
>> I can use sbt to compile and run the following code. It works without any
>> problem.
>>
>> I want to divide this into the obj and another class. I would like to do
>> the result set joining tables identified by Data Frame 'rs' and then calls
>> the method "firstquerym" in the class FirstQuery to do the calculation
>> identified as "rs1"
>>
>> Now it needs "rs" to be available in class FrstQuery. Two questions please
>>
>>
>>1. How can I pass rs to class FirstQuery
>>2. Is there a better way of modularising this work so I can use
>>methods defined in another class to be called in main method
>>
>> Thanks
>>
>> import org.apache.spark.SparkContext
>> import org.apache.spark.SparkConf
>> import org.apache.spark.sql.Row
>> import org.apache.spark.sql.hive.HiveContext
>> import org.apache.spark.sql.types._
>> import org.apache.spark.sql.SQLContext
>> import org.apache.spark.sql.functions._
>> //
>> object Harness4 {
>>   def main(args: Array[String]) {
>>   val conf = new
>> SparkConf().setAppName("Harness4").setMaster("local[*]").set("spark.driver.allowMultipleContexts",
>> "true")
>>   val sc = new SparkContext(conf)
>>   // Note that this should be done only after an instance of
>> org.apache.spark.sql.SQLContext is created. It should be written as:
>>   val sqlContext= new org.apache.spark.sql.SQLContext(sc)
>>   import sqlContext.implicits._
>>   val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>> println ("\nStarted at"); HiveContext.sql("SELECT
>> FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
>> ").collect.foreach(println)
>> HiveContext.sql("use oraclehadoop")
>> var s =
>> HiveContext.table("sales").select("AMOUNT_SOLD","TIME_ID","CHANNEL_ID")
>> val c = HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC")
>> val t = HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC")
>> println ("\ncreating data set at"); HiveContext.sql("SELECT
>> FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
>> ").collect.foreach(println)
>> val rs =
>> s.join(t,"time_id").join(c,"channel_id").groupBy("calendar_month_desc","channel_desc").agg(sum("amount_sold").as("TotalSales"))
>> //println ("\nfirst query at"); HiveContext.sql("SELECT
>> FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
>> ").collect.foreach(println)
>> //val rs1 =
>> rs.orderBy("calendar_month_desc","channel_desc").take(5).foreach(println)
>> val firstquery =  new FirstQuery
>> firstquery.firstquerym
>>  }
>> }
>> //
>> class FirstQuery {
>>def firstquerym {
>>   println ("\nfirst query at"); HiveContext.sql("SELECT
>> FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
>> ").collect.foreach(println)
>>   val rs1 =
>> rs.orderBy("calendar_month_desc","channel_desc").take(5).foreach(println)
>>   }
>> }
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>
>


How can I pass a Data Frame from object to another class

2016-03-05 Thread Mich Talebzadeh
Hi,

I can use sbt to compile and run the following code. It works without any
problem.

I want to divide this into the obj and another class. I would like to do
the result set joining tables identified by Data Frame 'rs' and then calls
the method "firstquerym" in the class FirstQuery to do the calculation
identified as "rs1"

Now it needs "rs" to be available in class FrstQuery. Two questions please


   1. How can I pass rs to class FirstQuery
   2. Is there a better way of modularising this work so I can use methods
   defined in another class to be called in main method

Thanks

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.sql.Row
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.types._
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.functions._
//
object Harness4 {
  def main(args: Array[String]) {
  val conf = new
SparkConf().setAppName("Harness4").setMaster("local[*]").set("spark.driver.allowMultipleContexts",
"true")
  val sc = new SparkContext(conf)
  // Note that this should be done only after an instance of
org.apache.spark.sql.SQLContext is created. It should be written as:
  val sqlContext= new org.apache.spark.sql.SQLContext(sc)
  import sqlContext.implicits._
  val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
println ("\nStarted at"); HiveContext.sql("SELECT
FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
").collect.foreach(println)
HiveContext.sql("use oraclehadoop")
var s =
HiveContext.table("sales").select("AMOUNT_SOLD","TIME_ID","CHANNEL_ID")
val c = HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC")
val t = HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC")
println ("\ncreating data set at"); HiveContext.sql("SELECT
FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
").collect.foreach(println)
val rs =
s.join(t,"time_id").join(c,"channel_id").groupBy("calendar_month_desc","channel_desc").agg(sum("amount_sold").as("TotalSales"))
//println ("\nfirst query at"); HiveContext.sql("SELECT
FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
").collect.foreach(println)
//val rs1 =
rs.orderBy("calendar_month_desc","channel_desc").take(5).foreach(println)
val firstquery =  new FirstQuery
firstquery.firstquerym
 }
}
//
class FirstQuery {
   def firstquerym {
  println ("\nfirst query at"); HiveContext.sql("SELECT
FROM_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss')
").collect.foreach(println)
  val rs1 =
rs.orderBy("calendar_month_desc","channel_desc").take(5).foreach(println)
  }
}



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com