Re: hivecontext error

2016-06-14 Thread Ted Yu
Which release of Spark are you using ?

Can you show the full error trace ?

Thanks

On Tue, Jun 14, 2016 at 6:33 PM, Tejaswini Buche <
tejaswini.buche0...@gmail.com> wrote:

> I am trying to use hivecontext in spark. The following statements are
> running fine :
>
> from pyspark.sql import HiveContext
> sqlContext = HiveContext(sc)
>
> But, when i run the below statement,
>
> sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
>
> I get the following error :
>
> Java Package object not callable
>
> what could be the problem?
> thnx
>


hivecontext error

2016-06-14 Thread Tejaswini Buche
I am trying to use hivecontext in spark. The following statements are
running fine :

from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)

But, when i run the below statement,

sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")

I get the following error :

Java Package object not callable

what could be the problem?
thnx


Re: SPARK SQL HiveContext Error

2016-03-01 Thread Gourav Sengupta
Hi Mich,
thanks a ton for your kind response, but this error was happening because
of loading derby classes mroe than once

In my second email I mentioned the steps that I took in order to resolve
the issue.


Thanks and Regards,
Gourav

On Tue, Mar 1, 2016 at 8:54 PM, Mich Talebzadeh 
wrote:

> Hi Gourav,
>
> Did you modify the following line in your code
>
>  val conf = new
> SparkConf().setAppName("IdeaProjects").setMaster("local[*]").set("spark.driver.allowMultipleContexts",
> "true")
>
> I checked every line in your code they work fine in spark shell with the
> following package added
>
> spark-shell --master spark://50.140.197.217:7077 --packages
> amplab:succinct:0.1.6
>
> Can you explain how it worked?
>
> Thanks
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 1 March 2016 at 18:20, Gourav Sengupta 
> wrote:
>
>> Hi,
>>
>> FIRST ATTEMPT:
>> Use build.sbt in IntelliJ and it was giving me nightmares with several
>> incompatibility and library issues though the sbt version was compliant
>> with the scala version
>>
>> SECOND ATTEMPT:
>> Created a new project with no entries in build.sbt file and imported all
>> the files in $SPARK_HOME/lib/*jar into the project. This started causing
>> issues I reported earlier
>>
>> FINAL ATTEMPT:
>> removed all the files from the import (removing them from dependencies)
>> which had the word derby in it and this resolved the issue.
>>
>> Please note that the following additional jars were included in the
>> library folder than the ones which are usually supplied with the SPARK
>> distribution:
>> 1. ojdbc7.jar
>> 2. spark-csv***jar file
>>
>>
>> Regards,
>> Gourav Sengupta
>>
>> On Tue, Mar 1, 2016 at 5:19 PM, Gourav Sengupta <
>> gourav.sengu...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am getting the error  "*java.lang.SecurityException: sealing
>>> violation: can't seal package org.apache.derby.impl.services.locks: already
>>> loaded"*   after running the following code in SCALA.
>>>
>>> I do not have any other instances of sparkContext running from my system.
>>>
>>> I will be grateful for if anyone could kindly help me out.
>>>
>>>
>>> Environment:
>>> SCALA: 1.6
>>> OS: MAC OS X
>>>
>>> 
>>>
>>> import org.apache.spark.SparkContext
>>> import org.apache.spark.SparkConf
>>> import org.apache.spark.sql.Row
>>> import org.apache.spark.sql.hive.HiveContext
>>> import org.apache.spark.sql.types._
>>> import org.apache.spark.sql.SQLContext
>>>
>>> // Import SuccinctRDD
>>> import edu.berkeley.cs.succinct._
>>>
>>> object test1 {
>>>   def main(args: Array[String]) {
>>> //the below line returns nothing
>>> println(SparkContext.jarOfClass(this.getClass).toString())
>>> val logFile = "/tmp/README.md" // Should be some file on your system
>>>
>>> val conf = new 
>>> SparkConf().setAppName("IdeaProjects").setMaster("local[*]")
>>> val sc = new SparkContext(conf)
>>> val logData = sc.textFile(logFile, 2).cache()
>>> val numAs = logData.filter(line => line.contains("a")).count()
>>> val numBs = logData.filter(line => line.contains("b")).count()
>>> println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
>>>
>>>
>>> // Create a Spark RDD as a collection of articles; ctx is the 
>>> SparkContext
>>> val articlesRDD = sc.textFile("/tmp/README.md").map(_.getBytes)
>>>
>>> // Compress the Spark RDD into a Succinct Spark RDD, and persist it in 
>>> memory
>>> // Note that this is a time consuming step (usually at 8GB/hour/core) 
>>> since data needs to be compressed.
>>> // We are actively working on making this step faster.
>>> val succinctRDD = articlesRDD.succinct.persist()
>>>
>>>
>>> // SuccinctRDD supports a set of powerful primitives directly on 
>>> compressed RDD
>>> // Let us start by counting the number of occurrences of "Berkeley" 
>>> across all Wikipedia articles
>>> val count = succinctRDD.count("the")
>>>
>>> // Now suppose we want to find all offsets in the collection at which 
>>> ìBerkeleyî occurs; and
>>> // create an RDD containing all resulting offsets
>>> val offsetsRDD = succinctRDD.search("and")
>>>
>>> // Let us look at the first ten results in the above RDD
>>> val offsets = offsetsRDD.take(10)
>>>
>>> // Finally, let us extract 20 bytes before and after one of the 
>>> occurrences of ìBerkeleyî
>>> val offset = offsets(0)
>>> val data = succinctRDD.extract(offset - 20, 40)
>>>
>>> println(data)
>>> println(">>>")
>>>
>>>
>>> // Create a schema
>>> val citySchema = StructType(Seq(
>>>   StructField("Name", StringType, false),
>>>   StructField("Length", IntegerType, true),
>>>   StructField("Area", DoubleType, false),
>>>   

Re: SPARK SQL HiveContext Error

2016-03-01 Thread Gourav Sengupta
Hi,

FIRST ATTEMPT:
Use build.sbt in IntelliJ and it was giving me nightmares with several
incompatibility and library issues though the sbt version was compliant
with the scala version

SECOND ATTEMPT:
Created a new project with no entries in build.sbt file and imported all
the files in $SPARK_HOME/lib/*jar into the project. This started causing
issues I reported earlier

FINAL ATTEMPT:
removed all the files from the import (removing them from dependencies)
which had the word derby in it and this resolved the issue.

Please note that the following additional jars were included in the library
folder than the ones which are usually supplied with the SPARK distribution:
1. ojdbc7.jar
2. spark-csv***jar file


Regards,
Gourav Sengupta

On Tue, Mar 1, 2016 at 5:19 PM, Gourav Sengupta 
wrote:

> Hi,
>
> I am getting the error  "*java.lang.SecurityException: sealing violation:
> can't seal package org.apache.derby.impl.services.locks: already loaded"*
>   after running the following code in SCALA.
>
> I do not have any other instances of sparkContext running from my system.
>
> I will be grateful for if anyone could kindly help me out.
>
>
> Environment:
> SCALA: 1.6
> OS: MAC OS X
>
> 
>
> import org.apache.spark.SparkContext
> import org.apache.spark.SparkConf
> import org.apache.spark.sql.Row
> import org.apache.spark.sql.hive.HiveContext
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.SQLContext
>
> // Import SuccinctRDD
> import edu.berkeley.cs.succinct._
>
> object test1 {
>   def main(args: Array[String]) {
> //the below line returns nothing
> println(SparkContext.jarOfClass(this.getClass).toString())
> val logFile = "/tmp/README.md" // Should be some file on your system
>
> val conf = new 
> SparkConf().setAppName("IdeaProjects").setMaster("local[*]")
> val sc = new SparkContext(conf)
> val logData = sc.textFile(logFile, 2).cache()
> val numAs = logData.filter(line => line.contains("a")).count()
> val numBs = logData.filter(line => line.contains("b")).count()
> println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
>
>
> // Create a Spark RDD as a collection of articles; ctx is the SparkContext
> val articlesRDD = sc.textFile("/tmp/README.md").map(_.getBytes)
>
> // Compress the Spark RDD into a Succinct Spark RDD, and persist it in 
> memory
> // Note that this is a time consuming step (usually at 8GB/hour/core) 
> since data needs to be compressed.
> // We are actively working on making this step faster.
> val succinctRDD = articlesRDD.succinct.persist()
>
>
> // SuccinctRDD supports a set of powerful primitives directly on 
> compressed RDD
> // Let us start by counting the number of occurrences of "Berkeley" 
> across all Wikipedia articles
> val count = succinctRDD.count("the")
>
> // Now suppose we want to find all offsets in the collection at which 
> ìBerkeleyî occurs; and
> // create an RDD containing all resulting offsets
> val offsetsRDD = succinctRDD.search("and")
>
> // Let us look at the first ten results in the above RDD
> val offsets = offsetsRDD.take(10)
>
> // Finally, let us extract 20 bytes before and after one of the 
> occurrences of ìBerkeleyî
> val offset = offsets(0)
> val data = succinctRDD.extract(offset - 20, 40)
>
> println(data)
> println(">>>")
>
>
> // Create a schema
> val citySchema = StructType(Seq(
>   StructField("Name", StringType, false),
>   StructField("Length", IntegerType, true),
>   StructField("Area", DoubleType, false),
>   StructField("Airport", BooleanType, true)))
>
> // Create an RDD of Rows with some data
> val cityRDD = sc.parallelize(Seq(
>   Row("San Francisco", 12, 44.52, true),
>   Row("Palo Alto", 12, 22.33, false),
>   Row("Munich", 8, 3.14, true)))
>
>
> val hiveContext = new HiveContext(sc)
>
> //val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>
>   }
> }
>
>
> -
>
>
>
> Regards,
> Gourav Sengupta
>


SPARK SQL HiveContext Error

2016-03-01 Thread Gourav Sengupta
Hi,

I am getting the error  "*java.lang.SecurityException: sealing violation:
can't seal package org.apache.derby.impl.services.locks: already loaded"*
after running the following code in SCALA.

I do not have any other instances of sparkContext running from my system.

I will be grateful for if anyone could kindly help me out.


Environment:
SCALA: 1.6
OS: MAC OS X



import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.sql.Row
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.types._
import org.apache.spark.sql.SQLContext

// Import SuccinctRDD
import edu.berkeley.cs.succinct._

object test1 {
  def main(args: Array[String]) {
//the below line returns nothing
println(SparkContext.jarOfClass(this.getClass).toString())
val logFile = "/tmp/README.md" // Should be some file on your system

val conf = new SparkConf().setAppName("IdeaProjects").setMaster("local[*]")
val sc = new SparkContext(conf)
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))


// Create a Spark RDD as a collection of articles; ctx is the SparkContext
val articlesRDD = sc.textFile("/tmp/README.md").map(_.getBytes)

// Compress the Spark RDD into a Succinct Spark RDD, and persist
it in memory
// Note that this is a time consuming step (usually at
8GB/hour/core) since data needs to be compressed.
// We are actively working on making this step faster.
val succinctRDD = articlesRDD.succinct.persist()


// SuccinctRDD supports a set of powerful primitives directly on
compressed RDD
// Let us start by counting the number of occurrences of
"Berkeley" across all Wikipedia articles
val count = succinctRDD.count("the")

// Now suppose we want to find all offsets in the collection at
which ìBerkeleyî occurs; and
// create an RDD containing all resulting offsets
val offsetsRDD = succinctRDD.search("and")

// Let us look at the first ten results in the above RDD
val offsets = offsetsRDD.take(10)

// Finally, let us extract 20 bytes before and after one of the
occurrences of ìBerkeleyî
val offset = offsets(0)
val data = succinctRDD.extract(offset - 20, 40)

println(data)
println(">>>")


// Create a schema
val citySchema = StructType(Seq(
  StructField("Name", StringType, false),
  StructField("Length", IntegerType, true),
  StructField("Area", DoubleType, false),
  StructField("Airport", BooleanType, true)))

// Create an RDD of Rows with some data
val cityRDD = sc.parallelize(Seq(
  Row("San Francisco", 12, 44.52, true),
  Row("Palo Alto", 12, 22.33, false),
  Row("Munich", 8, 3.14, true)))


val hiveContext = new HiveContext(sc)

//val sqlContext = new org.apache.spark.sql.SQLContext(sc)

  }
}


-



Regards,
Gourav Sengupta


HiveContext error

2015-08-05 Thread Stefan Panayotov
Hello,
 
I am trying to define an external Hive table from Spark HiveContext like the 
following:
 
import org.apache.spark.sql.hive.HiveContext
val hiveCtx = new HiveContext(sc)
 
hiveCtx.sql(sCREATE EXTERNAL TABLE IF NOT EXISTS Rentrak_Ratings (Version 
string, Gen_Date string, Market_Number string, Market_Name string, Time_Zone 
string, Number_Households string,
 | DateTime string, Program_Start_Time string, Program_End_Time string, Station 
string, Station_Name string, Call_Sign string, Network_Name string, Program 
string,
 | Series_Name string, Series_Number string, Episode_Number string, 
Episode_Title string, Demographic string, Demographic_Name string, HHUniverse 
string,
 | Share_15min_Segment string, PHUT_15min_Segment string, Rating_15min_Segment 
string, AV_Audience_15min_Segment string)
 | PARTITIONED BY (year INT, month INT)
 | ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'.stripMargin)

And I am getting the following error:
 
org.apache.spark.sql.execution.QueryExecutionException: FAILED: Hive Internal 
Error: java.lang.ClassNotFoundException(org.apache.hadoop.hive.ql.hooks.ATSHook)

at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:324)
at 
org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:292)

at 
org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33)

at 
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:54)

at 
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:54)

at 
org.apache.spark.sql.execution.ExecutedCommand.execute(commands.scala:64)

at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1099)

at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1099)

at org.apache.spark.sql.DataFrame.init(DataFrame.scala:147)

at org.apache.spark.sql.DataFrame.init(DataFrame.scala:130)

at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)

at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:103)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:27)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:37)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:39)

at $iwC$$iwC$$iwC$$iwC$$iwC.init(console:41)

at $iwC$$iwC$$iwC$$iwC.init(console:43)

at $iwC$$iwC$$iwC.init(console:45)

at $iwC$$iwC.init(console:47)

at $iwC.init(console:49)

at init(console:51)

at .init(console:55)

at .clinit(console)

at .init(console:7)

at .clinit(console)

 
Can anybody help please?


Stefan Panayotov, PhD 
Home: 610-355-0919 
Cell: 610-517-5586 
email: spanayo...@msn.com 
spanayo...@outlook.com 
spanayo...@comcast.net