Hi Ted/ Holden,

I had read a section in the book Learning Spark which advised against
passing entire objects to SPARK instead of just functions (ref: page 30
passing functions to SPARK).

Is the above way of solving problem not going against it? It will be
exciting to see your kind explanation.


Regards,
Gourav Sengupta

On Sun, Mar 6, 2016 at 10:57 AM, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Thanks for this tip
>
> The way I do it is to pass SparckContext "sc" to method
> firstquery.firstquerym by calling the following
>
> val firstquery =  new FirstQuery
> firstquery.firstquerym(sc, rs)
>
>
> And creating the method as follows:
>
> class FirstQuery {
>    def firstquerym(sc: org.apache.spark.SparkContext, rs:
> org.apache.spark.sql.DataFrame) {
>    val sqlContext = SQLContext.getOrCreate(sc)
>        println ("\nfirst query at"); sqlContext.sql("SELECT
> FROM_unixtime(unix_timestamp(), 'dd/MM/yyyy HH:mm:ss.ss')
> ").collect.foreach(println)
>       val rs1 =
> rs.orderBy("calendar_month_desc","channel_desc").take(5).foreach(println)
>   }
> }
>
> This works. However, I don't seem to invoke getOrCreate without passing sc?
>
> Is this the way you are implying. Also why "sc" is not available within
> the life of JVM please
>
> Thanks
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 6 March 2016 at 01:25, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> Looking at the methods you call on HiveContext, they seem to belong
>> to SQLContext.
>>
>> For SQLContext, you can use the below method of SQLContext in FirstQuery
>> to retrieve SQLContext:
>>
>>   def getOrCreate(sparkContext: SparkContext): SQLContext = {
>>
>> FYI
>>
>> On Sat, Mar 5, 2016 at 3:37 PM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> I managed to sort this one out.
>>>
>>> The class should be defined as below with its method accepting two input
>>> parameters for HiveContext and rs as below
>>>
>>> class FirstQuery {
>>>    def firstquerym(HiveContext: org.apache.spark.sql.hive.HiveContext,
>>> rs: org.apache.spark.sql.DataFrame) {
>>>        println ("\nfirst query at"); HiveContext.sql("SELECT
>>> FROM_unixtime(unix_timestamp(), 'dd/MM/yyyy HH:mm:ss.ss')
>>> ").collect.foreach(println)
>>>       val rs1 =
>>> rs.orderBy("calendar_month_desc","channel_desc").take(5).foreach(println)
>>>   }
>>> }
>>>
>>> and called from the main method as follows:
>>>
>>> val firstquery =  new FirstQuery
>>> firstquery.firstquerym(HiveContext, rs)
>>>
>>>
>>> Thanks
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 5 March 2016 at 20:56, Mich Talebzadeh <mich.talebza...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I can use sbt to compile and run the following code. It works without
>>>> any problem.
>>>>
>>>> I want to divide this into the obj and another class. I would like to
>>>> do the result set joining tables identified by Data Frame 'rs' and then
>>>> calls the method "firstquerym" in the class FirstQuery to do the
>>>> calculation identified as "rs1"
>>>>
>>>> Now it needs "rs" to be available in class FrstQuery. Two questions
>>>> please
>>>>
>>>>
>>>>    1. How can I pass rs to class FirstQuery
>>>>    2. Is there a better way of modularising this work so I can use
>>>>    methods defined in another class to be called in main method
>>>>
>>>> Thanks
>>>>
>>>> import org.apache.spark.SparkContext
>>>> import org.apache.spark.SparkConf
>>>> import org.apache.spark.sql.Row
>>>> import org.apache.spark.sql.hive.HiveContext
>>>> import org.apache.spark.sql.types._
>>>> import org.apache.spark.sql.SQLContext
>>>> import org.apache.spark.sql.functions._
>>>> //
>>>> object Harness4 {
>>>>   def main(args: Array[String]) {
>>>>   val conf = new
>>>> SparkConf().setAppName("Harness4").setMaster("local[*]").set("spark.driver.allowMultipleContexts",
>>>> "true")
>>>>   val sc = new SparkContext(conf)
>>>>   // Note that this should be done only after an instance of
>>>> org.apache.spark.sql.SQLContext is created. It should be written as:
>>>>   val sqlContext= new org.apache.spark.sql.SQLContext(sc)
>>>>   import sqlContext.implicits._
>>>>   val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>>>> println ("\nStarted at"); HiveContext.sql("SELECT
>>>> FROM_unixtime(unix_timestamp(), 'dd/MM/yyyy HH:mm:ss.ss')
>>>> ").collect.foreach(println)
>>>> HiveContext.sql("use oraclehadoop")
>>>> var s =
>>>> HiveContext.table("sales").select("AMOUNT_SOLD","TIME_ID","CHANNEL_ID")
>>>> val c =
>>>> HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC")
>>>> val t =
>>>> HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC")
>>>> println ("\ncreating data set at"); HiveContext.sql("SELECT
>>>> FROM_unixtime(unix_timestamp(), 'dd/MM/yyyy HH:mm:ss.ss')
>>>> ").collect.foreach(println)
>>>> val rs =
>>>> s.join(t,"time_id").join(c,"channel_id").groupBy("calendar_month_desc","channel_desc").agg(sum("amount_sold").as("TotalSales"))
>>>> //println ("\nfirst query at"); HiveContext.sql("SELECT
>>>> FROM_unixtime(unix_timestamp(), 'dd/MM/yyyy HH:mm:ss.ss')
>>>> ").collect.foreach(println)
>>>> //val rs1 =
>>>> rs.orderBy("calendar_month_desc","channel_desc").take(5).foreach(println)
>>>> val firstquery =  new FirstQuery
>>>> firstquery.firstquerym
>>>>  }
>>>> }
>>>> //
>>>> class FirstQuery {
>>>>    def firstquerym {
>>>>       println ("\nfirst query at"); HiveContext.sql("SELECT
>>>> FROM_unixtime(unix_timestamp(), 'dd/MM/yyyy HH:mm:ss.ss')
>>>> ").collect.foreach(println)
>>>>       val rs1 =
>>>> rs.orderBy("calendar_month_desc","channel_desc").take(5).foreach(println)
>>>>   }
>>>> }
>>>>
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * 
>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>
>>>
>>
>

Reply via email to