Hi Ayan,
Thanks for your response.
In my case the constraint is I have to use Hive 0.14 for some other
usecases.
I believe the incompatibility is at the thrift server level (the hiveserver
2 which comes with hive). If I use Hive 0.13 hiverserver 2 in the same node
as of spark master should that
Your parenthesis don't look right as you're embedding the filter on the
Row.fromSeq().
Try this:
val trainRDD = rawTrainData
.filter(!_.isEmpty)
.map(rawRow = Row.fromSeq(rawRow.split(,)))
.filter(_.length == 15)
.map(_.toString).map(_.trim)
-Don
On Fri,
ah, that explains it, many thanks!
On Sat, May 16, 2015 at 7:41 PM, Yana Kadiyska yana.kadiy...@gmail.com
wrote:
oh...metastore_db location is not controlled by
hive.metastore.warehouse.dir -- one is the location of your metastore DB,
the other is the physical location of your stored data.
Is anyone else having issues when building spark from git?
I created a jira ticket with a Docker file that reproduces the issue.
The error:
/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/UploadBlock.java:56:
error: not found: type Type
protected Type type() {
HI,
I'm trying to execute simple sql statement from spark-shell
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) - This one
executes properly.
Next I'm trying -
sqlContext.sql(CREATE TABLE IF NOT EXISTS src (key INT, value STRING))
This keeps on trying to connect Metastore but
Hi,
is it expected I can't reference a column inside of IF statement like this:
sctx.sql(SELECT name, IF(ts0, price, 0) FROM table).collect()
I get an error:
org.apache.spark.sql.AnalysisException: unresolved operator 'Project [name#0,if
((CAST(ts#1, DoubleType) CAST(0, DoubleType))) price#2
Any resolution to this? Im having the same problem.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-YARN-on-AWS-EMR-Issues-finding-file-on-hdfs-tp10214p22918.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Gave it another try - it seems that it picks up the variable and prints out
the correct value, but still puts the metatore_db folder in the current
directory, regardless.
On Sat, May 16, 2015 at 1:13 PM, Tamas Jambor jambo...@gmail.com wrote:
Thank you for the reply.
I have tried your
oh...metastore_db location is not controlled by
hive.metastore.warehouse.dir -- one is the location of your metastore DB,
the other is the physical location of your stored data. Checkout this SO
thread:
http://stackoverflow.com/questions/13624893/metastore-db-created-wherever-i-run-hive
On Sat,
Any resolution to this? I am having the same problem.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/zip-files-submitted-with-py-files-disappear-from-hdfs-after-a-while-on-EMR-tp22342p22919.html
Sent from the Apache Spark User List mailing list archive at
Try like this
SELECT name, case when ts0 then price else 0 end from table
On Sun, May 17, 2015 at 12:21 AM, Antony Mayi antonym...@yahoo.com.invalid
wrote:
Hi,
is it expected I can't reference a column inside of IF statement like this:
sctx.sql(SELECT name, IF(ts0, price, 0) FROM
Hi
Try with Hive 0.13. If I am not wrong, Hive 0.14 is not supported yet,
definitely not with 1.2.1 :)
On Sun, May 17, 2015 at 1:14 AM, smazumder sourav.mazumde...@gmail.com
wrote:
HI,
I'm trying to execute simple sql statement from spark-shell
val sqlContext = new
Here is from documentation:
Spark SQL is designed to be compatible with the Hive Metastore, SerDes and
UDFs. Currently Spark SQL is based on Hive 0.12.0 and 0.13.1.
On Sun, May 17, 2015 at 1:48 AM, ayan guha guha.a...@gmail.com wrote:
Hi
Try with Hive 0.13. If I am not wrong, Hive 0.14 is
Hi Ayan and Helena,
I've considered using Cassandra/HBase but ended up opting to save to worker
hdfs because I want to take advantage of the data locality since the data
will often be loaded to Spark for further processing. I was also under the
impression that saving to filesystem (instead of db)
Hi
If you asked to any DB developer, s/he would tell you the construct:
select userid,time,state,
rank() over (partition by userId order by time desc) r
from event) where r=1
I am not sure if Dataframe supports it, though I am sure we can extend
functions to implement it.
But here is one not
Thank you for the reply.
I have tried your experiment, it seems that it does not print the settings
out in spark-shell (I'm using 1.3 by the way).
Strangely I have been experimenting with an SQL connection instead, which
works after all (still if I go to spark-shell and try to print out the SQL
Consider using cassandra with spark streaming and timeseries, cassandra has
been doing time series for years.
Here’s some snippets with kafka streaming and writing/reading the data back:
For the Spark Streaming app, if you want a particular action inside a
foreachRDD to go to a particular pool, then make sure you set the pool
within the foreachRDD function. E.g.
dstream.foreachRDD { rdd =
rdd.sparkContext.setLocalProperty(spark.scheduler.pool, pool1) //
set the pool
All - this issue showed up when I was tearing down a spark context and
creating a new one. Often, I was unable to then write to HDFS due to this
error. I subsequently switched to a different implementation where instead
of tearing down and re initializing the spark context I'd instead submit a
What Spark release are you using ?
Can you check driver log to see if there is some clue there ?
Thanks
On Sat, May 16, 2015 at 12:01 AM, xiaohe lan zombiexco...@gmail.com wrote:
Hi,
I have a 5 nodes yarn cluster, I used spark-submit to submit a simple app.
spark-submit --master yarn
Hello,
I am using MLPipeline. I would like to extract the best parameter found by
CrossValidator. But I cannot find much document about how to do it. Can
anyone give me some pointers?
Thanks.
Justin
--
View this message in context:
Hi all,
Recently I've ran into a scenario to conduct two sample tests between all
paired combination of columns of an RDD. But the networking load and
generation of pair-wise computation is too time consuming. That has puzzled
me for a long time. I want to conduct Wilcoxon rank-sum test
22 matches
Mail list logo