JDBC to hive batch use case in spark

2017-12-09 Thread Hokam Singh Chauhan
Hi,
I have an use case in which I wants to read data from a jdbc source(Oracle)
table and write it to hive table on periodic basis. I tried this using the
SQL context to read from Oracle and Hive context to write the data in hive.
The data read parts works fine but when I ran the save call on hive context
to write data, it throws the exception and it says the table or view does
not exists even though the table is precreated in hive.

Please help if anyone tried such scenario.

Thanks


How to run "merge into" ACID transaction hive query using hive java api?

2017-09-12 Thread Hokam Singh Chauhan
Please share if any one know how to execute "merge into" hive query.
Thanks,
Hokam


Re: Error getting response from spark driver rest APIs : java.lang.IncompatibleClassChangeError: Implementing class

2015-12-26 Thread Hokam Singh Chauhan
5)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at
>
> com.sun.jersey.api.core.ScanningResourceConfig.init(ScanningResourceConfig.java:79)
> at
>
> com.sun.jersey.api.core.PackagesResourceConfig.init(PackagesResourceConfig.java:104)
> at
>
> com.sun.jersey.api.core.PackagesResourceConfig.(PackagesResourceConfig.java:78)
> at
>
> com.sun.jersey.api.core.PackagesResourceConfig.(PackagesResourceConfig.java:89)
> ... 33 more
>
> Please help.
>
> Thanks in advance.
>
> Regards,
> Rakesh
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Error-getting-response-from-spark-driver-rest-APIs-java-lang-IncompatibleClassChangeError-Implementis-tp25724.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 
Thanks and Regards,
Hokam Singh Chauhan
Mobile : 09407125190


Re: REST Api not working in spark

2015-12-26 Thread Hokam Singh Chauhan
quest(AbstractHttpConnection.java:494)
>   at 
> org.spark-project.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
>   at 
> org.spark-project.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
>   at 
> org.spark-project.jetty.http.HttpParser.parseNext(HttpParser.java:644)
>   at 
> org.spark-project.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
>   at 
> org.spark-project.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
>   at 
> org.spark-project.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667)
>   at 
> org.spark-project.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
>   at 
> org.spark-project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
>   at 
> org.spark-project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
>   at java.lang.Thread.run(Thread.java:745)
> 
> Powered by Jetty://
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>
> 
> 
>
> Regards,
>
> Aman Solanki
>
> The information contained in this electronic message and any attachments
> to this message are intended for the exclusive use of the addressee(s) and
> may contain proprietary, confidential or privileged information. If you are
> not the intended recipient, you should not disseminate, distribute or copy
> this e-mail. Please notify the sender immediately and destroy all copies of
> this message and any attachments. WARNING: Computer viruses can be
> transmitted via email. The recipient should check this email and any
> attachments for the presence of viruses. The company accepts no liability
> for any damage caused by any virus transmitted by this email.
> www.wipro.com
>



-- 
Thanks and Regards,
Hokam Singh Chauhan
Mobile : 09407125190


Re: Problem of submitting Spark task to cluster from eclipse IDE on Windows

2015-12-23 Thread Hokam Singh Chauhan
Hi,

Use spark://hostname:7077 as spark master if you are using IP address in
place of hostname.

I have faced the same issue, it got resolved by using hostname in spark
master instead of using IP address.

Regards,
Hokam
On 23 Dec 2015 13:41, "Akhil Das"  wrote:

> You need to:
>
> 1. Make sure your local router have NAT enabled and port forwarded the
> networking ports listed here
> .
> 2. Make sure on your clusters 7077 is accessible from your local (public)
> ip address. You can try telnet 10.20.17.70 7077
> 3. Set spark.driver.host so that the cluster can connect back to your
> machine.
>
>
>
> Thanks
> Best Regards
>
> On Wed, Dec 23, 2015 at 10:02 AM, superbee84  wrote:
>
>> Hi All,
>>
>>I'm new to Spark. Before I describe the problem, I'd like to let you
>> know
>> the role of the machines that organize the cluster and the purpose of my
>> work. By reading and follwing the instructions and tutorials, I
>> successfully
>> built up a cluster with 7 CentOS-6.5 machines. I installed Hadoop 2.7.1,
>> Spark 1.5.1, Scala 2.10.4 and ZooKeeper 3.4.5 on them. The details are
>> listed as below:
>>
>>
>> Host Name  |  IP Address  |  Hadoop 2.7.1 | Spark 1.5.1|
>> ZooKeeper
>> hadoop00   | 10.20.17.70  | NameNode(Active)   | Master(Active)   |   none
>> hadoop01   | 10.20.17.71  | NameNode(Standby)| Master(Standby) |   none
>> hadoop02   | 10.20.17.72  | ResourceManager(Active)| none  |
>>  none
>> hadoop03   | 10.20.17.73  | ResourceManager(Standby)| none|  none
>> hadoop04   | 10.20.17.74  | DataNode  |  Worker  |
>> JournalNode
>> hadoop05   | 10.20.17.75  | DataNode  |  Worker  |
>> JournalNode
>> hadoop06   | 10.20.17.76  | DataNode  |  Worker  |
>> JournalNode
>>
>>Now my *purpose* is to develop Hadoop/Spark applications on my own
>> computer(IP: 10.20.6.23) and submit them to the remote cluster. As all the
>> other guys in our group are in the habit of eclipse on Windows, I'm trying
>> to work on this. I have successfully submitted the WordCount MapReduce job
>> to YARN and it run smoothly through eclipse and Windows. But when I tried
>> to
>> run the Spark WordCount, it gives me the following error in the eclipse
>> console:
>>
>> 15/12/23 11:15:30 INFO AppClient$ClientEndpoint: Connecting to master
>> spark://10.20.17.70:7077...
>> 15/12/23 11:15:50 ERROR SparkUncaughtExceptionHandler: Uncaught exception
>> in
>> thread Thread[appclient-registration-retry-thread,5,main]
>> java.util.concurrent.RejectedExecutionException: Task
>> java.util.concurrent.FutureTask@29ed85e7 rejected from
>> java.util.concurrent.ThreadPoolExecutor@28f21632[Running, pool size = 1,
>> active threads = 0, queued tasks = 0, completed tasks = 1]
>> at
>>
>> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(Unknown
>> Source)
>> at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source)
>> at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source)
>> at java.util.concurrent.AbstractExecutorService.submit(Unknown
>> Source)
>> at
>>
>> org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1.apply(AppClient.scala:96)
>> at
>>
>> org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1.apply(AppClient.scala:95)
>> at
>>
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>> at
>>
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>> at
>>
>> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>> at
>> scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>> at
>> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>> at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
>> at
>>
>> org.apache.spark.deploy.client.AppClient$ClientEndpoint.tryRegisterAllMasters(AppClient.scala:95)
>> at
>>
>> org.apache.spark.deploy.client.AppClient$ClientEndpoint.org$apache$spark$deploy$client$AppClient$ClientEndpoint$$registerWithMaster(AppClient.scala:121)
>> at
>>
>> org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2$$anonfun$run$1.apply$mcV$sp(AppClient.scala:132)
>> at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1119)
>> at
>>
>> org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2.run(AppClient.scala:124)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
>> Source)
>> at java.util.concurrent.FutureTask.runAndReset(Unknown Source)
>> at
>>
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown
>> Source)
>> at
>>
>> 

How to handle categorical variables in Spark MLlib?

2015-12-22 Thread Hokam Singh Chauhan
Hi,

We have one use case in which we need to handle the categorical variables
in SVM, Regression and Logistic regression models(MLlib not ML) for scoring.

We are getting the possible category values against each category variable.

So how the string value of categorical variable can be converted into
double values for forming the features vector ?

Also how the weight for individual categories can be calculated for models.
Like we have Gender as variable with categories as Male and Female and we
want to give more weight to female category, then how this can be
accomplished?

Also is there a way through which string values from raw text can be
converted to features vector(Apart from the HashingTF-IDF transformation) ?

-- 
Thanks and Regards,
Hokam Singh Chauhan
Mobile : 09407125190


How to kill the spark job using Java API.

2015-11-20 Thread Hokam Singh Chauhan
Hi,

I have been running the spark job on standalone spark cluster. I wants to
kill the spark job using Java API. I am having the spark job name and spark
job id.

The REST POST call for killing the job is not working.

If anyone explored it please help me out.

-- 
Thanks and Regards,
Hokam Singh Chauhan
Mobile : 09407125190