On 24 Mar 2015, at 02:10, Marcelo Vanzin van...@cloudera.com wrote:
This happens most probably because the Spark 1.3 you have downloaded
is built against an older version of the Hadoop libraries than those
used by CDH, and those libraries cannot parse the container IDs
generated by CDH.
Hello,
As of now, if I have to execute a Spark job, I need to create a jar and
deploy it. If I need to run a dynamically formed SQL from a Web
application, is there any way of using SparkSQL in this manner? Perhaps,
through a Web Service or something similar.
Regards,
Ashish
What is performance overhead caused by YARN, or what configurations are
being changed when the app is ran through YARN?
The following example:
sqlContext.sql(SELECT dayStamp(date),
count(distinct deviceId) AS c
FROM full
GROUP BY dayStamp(date)
ORDER BY c
DESC LIMIT 10)
.collect()
runs on shell
That's probably the problem; the intended path is on HDFS but the
configuration specifies a local path. See the exception message.
On Tue, Mar 24, 2015 at 1:08 PM, Akhil Das ak...@sigmoidanalytics.com wrote:
Its in your local file system, not in hdfs.
Thanks
Best Regards
On Tue, Mar 24,
thanks Sean,
please can you suggest in which file or configuration I need to modify
proper path, please elaborate which may help,
thanks,
Regards
Sachin
On Tue, Mar 24, 2015 at 7:15 PM, Sean Owen so...@cloudera.com wrote:
That's probably the problem; the intended path is on HDFS but the
Streaming _from_ cassandra, CassandraInputDStream, is coming BTW
https://issues.apache.org/jira/browse/SPARK-6283
https://issues.apache.org/jira/browse/SPARK-6283
I am working on it now.
Helena
@helenaedelson
On Mar 23, 2015, at 5:22 AM, Khanderao Kand Gmail khanderao.k...@gmail.com
wrote:
Hi Akhil,
thanks for your quick reply,
I would like to request please elaborate i.e. what kind of permission
required ..
thanks in advance,
Regards
Sachin
On Tue, Mar 24, 2015 at 5:29 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Its an IOException, just make sure you are having the correct
Hello!
I would like to know what is the optimal solution for getting the header
with from a CSV file with Spark? My aproach was:
def getHeader(data: RDD[String]): String = {
data.zipWithIndex().filter(_._2==0).map(x=x._1).take(1).mkString() }
Thanks.
Hello,
in the context of SPARK-2394 Make it easier to read LZO-compressed files
from EC2 clusters https://issues.apache.org/jira/browse/SPARK-2394 , I
was wondering:
Is there an easy way to make a user-provided script run at every machine in
a cluster launched on EC2?
Regards,
Theodore
--
hi,
I can see required permission is granted for this directory as under,
hadoop dfs -ls /user/spark
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Found 1 items
*drwxrwxrwt - spark spark 0 2015-03-20 01:04
Its in your local file system, not in hdfs.
Thanks
Best Regards
On Tue, Mar 24, 2015 at 6:25 PM, Sachin Singh sachin.sha...@gmail.com
wrote:
hi,
I can see required permission is granted for this directory as under,
hadoop dfs -ls /user/spark
DEPRECATED: Use of this script to execute hdfs
write permission as its clearly saying:
java.io.IOException:* Error in creating log directory:*
file:*/user/spark/*applicationHistory/application_1427194309307_0005
Thanks
Best Regards
On Tue, Mar 24, 2015 at 6:08 PM, Sachin Singh sachin.sha...@gmail.com
wrote:
Hi Akhil,
thanks for your
hi all,
all of sudden I getting below error when I am submitting spark job using
master as yarn its not able to create spark context,previously working fine,
I am using CDH5.3.1 and creating javaHiveContext
spark-submit --jars
Those implementations are computing an SVD of the input matrix
directly, and while you generally need the columns to have mean 0, you
can turn that off with the options you cite.
I don't think this is possible in the MLlib implementation, since it
is computing the principal components by
Hi Ashish,
this might be what you're looking for:
https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbcodbc-server
Regards,
Jeff
2015-03-24 11:28 GMT+01:00 Ashish Mukherjee ashish.mukher...@gmail.com:
Hello,
As of now, if I have to execute a Spark job, I
Hello Michael,
Thanks for your quick reply.
My question wrt Java/Scala was related to extending the classes to support
new custom data sources, so was wondering if those could be written in
Java, since our company is a Java shop.
The additional push downs I am looking for are aggregations with
Yeah thanks, I can now see the memory usage.
Please also verify if bytes read == Combined size of all RDDs ?
Actually, all my RDDs are completely cached in memory. So, Combined size of
my RDDs = Mem used (verified from WebUI)
On Fri, Mar 20, 2015 at 12:07 PM, Akhil Das
Its an IOException, just make sure you are having the correct permission
over */user/spark* directory.
Thanks
Best Regards
On Tue, Mar 24, 2015 at 5:21 PM, sachin Singh sachin.sha...@gmail.com
wrote:
hi all,
all of sudden I getting below error when I am submitting spark job using
master as
Hi,
I am doing ML using Spark mllib. However, I do not have full control to the
cluster. I am using Microsoft Azure HDInsight
I want to deploy the BLAS or whatever required dependencies to accelerate
the computation. But I don't know how to deploy those DLLs when I submit my
JAR to the cluster.
I don't think there's are general approach to that - the usecases are just
to different. If you really need it, you probably will have to implement
yourself in the driver of your application.
PS: Make sure to use the reply to all button so that the mailing list is
included in your reply.
Perhaps this project, https://github.com/calrissian/spark-jetty-server,
could help with your requirements.
On Tue, Mar 24, 2015 at 7:12 AM, Jeffrey Jedele jeffrey.jed...@gmail.com
wrote:
I don't think there's are general approach to that - the usecases are just
to different. If you really need
I have code that works under 1.2.1 but when I upgraded to 1.3.0 it fails to
find the s3 hadoop file system.
I get the java.lang.IllegalArgumentException: Wrong FS: s3://path to my
file], expected: file:/// when I try to save a parquet file. This worked in
1.2.1.
Has anyone else seen this?
I'm
Thanks Marcelo - I was using the SBT built spark per earlier thread. I
switched now to the distro (with the conf changes for CDH path in front)
and guava issue is gone.
Thanks,
On Tue, Mar 24, 2015 at 1:50 PM, Marcelo Vanzin van...@cloudera.com wrote:
Hi there,
On Tue, Mar 24, 2015 at 1:40
Reza,
That SVD.v matches the H2o and R prComp (non-centered)
Thanks
-R
On Tue, Mar 24, 2015 at 11:38 AM, Sean Owen so...@cloudera.com wrote:
(Oh sorry, I've only been thinking of TallSkinnySVD)
On Tue, Mar 24, 2015 at 6:36 PM, Reza Zadeh r...@databricks.com wrote:
If you want to do a
My memory is hazy on this but aren't there hidden limitations to
Linux-based threads? I ran into some issues a couple of years ago where,
and here is the fuzzy part, the kernel wants to reserve virtual memory per
thread equal to the stack size. When the total amount of reserved memory
(not
Awesome. yep - I have seen the warnings on UDTs, happy to keep up with the
API changes :). Would this be a reasonable PR to toss up despite the API
unstableness or would you prefer it to wait?
Thanks
-Pat
On Tue, Mar 24, 2015 at 7:44 PM, Michael Armbrust mich...@databricks.com
wrote:
I'll
Great!
On Tue, Mar 24, 2015 at 2:53 PM, roni roni.epi...@gmail.com wrote:
Reza,
That SVD.v matches the H2o and R prComp (non-centered)
Thanks
-R
On Tue, Mar 24, 2015 at 11:38 AM, Sean Owen so...@cloudera.com wrote:
(Oh sorry, I've only been thinking of TallSkinnySVD)
On Tue, Mar 24,
hi all,
got a vagrant image with spark notebook, spark, accumulo, and hadoop all
running. from notebook I can manually create a scanner and pull test data from
a table I created using one of the accumulo examples:
val instanceNameS = accumulo
val zooServersS = localhost:2181
val instance:
Also look at the spark-kernel and spark job server projects.
Irfan
On Mar 24, 2015 5:03 AM, Todd Nist tsind...@gmail.com wrote:
Perhaps this project, https://github.com/calrissian/spark-jetty-server,
could help with your requirements.
On Tue, Mar 24, 2015 at 7:12 AM, Jeffrey Jedele
Hi Yuichiro,
The way to avoid this is to boost spark.yarn.executor.memoryOverhead until
the executors have enough off-heap memory to avoid going over their limits.
-Sandy
On Tue, Mar 24, 2015 at 11:49 AM, Yuichiro Sakamoto ks...@muc.biglobe.ne.jp
wrote:
Hello.
We use ALS(Collaborative
imran,
great, i will take a look at the pullreq. seems we are interested in
similar things
On Tue, Mar 24, 2015 at 11:00 AM, Imran Rashid iras...@cloudera.com wrote:
I think writing to hdfs and reading it back again is totally reasonable.
In fact, in my experience, writing to hdfs and reading
Both spark-submit and spark-shell have a --jars option for passing
additional jars to the cluster. They will be added to the appropriate
classpaths.
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
Hello.
We use ALS(Collaborative filtering) of Spark MLlib on YARN.
Spark version is 1.2.0 included CDH 5.3.1.
1,000,000,000 records(5,000,000 users data and 5,000,000 items data) are
used for machine learning with ALS.
These large quantities of data increases virtual memory usage,
node manager
Hello Sandy,
Thank you for your explanation. Then I would at least expect that to be
consistent across local, yarn-client, and yarn-cluster modes. (And not lead
to the case where it somehow works in two of them, and not for the third).
Kind regards,
Emre Sevinç
http://www.bigindustries.be/
On
I think writing to hdfs and reading it back again is totally reasonable.
In fact, in my experience, writing to hdfs and reading back in actually
gives you a good opportunity to handle some other issues as well:
a) instead of just writing as an object file, I've found its helpful to
write in a
Helena,
The CassandraInputDStream sounds interesting. I dont find many things in
the jira though. Do you have more details on what it tries to achieve ?
Thanks,
Anwar.
On Tue, Mar 24, 2015 at 2:39 PM, Helena Edelson helena.edel...@datastax.com
wrote:
Streaming _from_ cassandra,
I found the problem.
In mapped-site.xml, mapreduce.application.classpath has references to
“${hdp.version}” which is not getting replaced
when launch_container.sh is created. The executor fails with a substitution
error at line 27 in launch_container.sh because bash
can’t deal with
Ah, yes, I believe this is because only properties prefixed with spark
get passed on. The purpose of the --conf option is to allow passing
Spark properties to the SparkConf, not to add general key-value pairs to
the JVM system properties.
-Sandy
On Tue, Mar 24, 2015 at 4:25 AM, Emre Sevinc
I created a jira ticket for my work in both the spark and
spark-cassandra-connector JIRAs, I don’t know why you can not see them.
Users can stream from any cassandra table, just as one can stream from a Kafka
topic; same principle.
Helena
@helenaedelson
On Mar 24, 2015, at 11:29 AM, Anwar
I am wondering if HiveContext connects to HiveServer2 or does it work though
Hive CLI. The reason I am asking is because Cloudera has deprecated Hive
CLI.
If the connection is through HiverServer2, is there a way to specify user
credentials?
--
View this message in context:
Steve, that's correct, but the problem only shows up when different
versions of the YARN jars are included on the classpath.
-Sandy
On Tue, Mar 24, 2015 at 6:29 AM, Steve Loughran ste...@hortonworks.com
wrote:
On 24 Mar 2015, at 02:10, Marcelo Vanzin van...@cloudera.com wrote:
This
The following statement appears in the Scala API example at
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrame
people.filter(age 30).
I tried this example and it gave a compilation error. I think this needs to
be changed to people.filter(people(age) 30)
I'm working with graphx to calculate the pageranks of an extreme large social
network with billion verteces.
As iteration number increases, the speed of each iteration becomes slower and
unacceptable. Is there any reason of it?
How can I accelerate the ineration process?
Hey Jim,
Thanks for reporting this. Can you give a small end-to-end code
example that reproduces it? If so, we can definitely fix it.
- Patrick
On Tue, Mar 24, 2015 at 4:55 PM, Jim Carroll jimfcarr...@gmail.com wrote:
I have code that works under 1.2.1 but when I upgraded to 1.3.0 it fails to
This might be because partitions are getting dropped from memory and
needing to be recomputed. How much memory is in the cluster, and how large
are the partitions? This information should be in the Executors and Storage
pages in the web UI.
Ankur http://www.ankurdave.com/
On Tue, Mar 24, 2015 at
You are probably hitting SPARK-6351
https://issues.apache.org/jira/browse/SPARK-6351, which will be fixed in
1.3.1 (hopefully cutting an RC this week).
On Tue, Mar 24, 2015 at 4:55 PM, Jim Carroll jimfcarr...@gmail.com wrote:
I have code that works under 1.2.1 but when I upgraded to 1.3.0 it
Hi,
I am trying to port some code that was working in Spark 1.2.0 on the latest
version, Spark 1.3.0. This code involves a left outer join between two
SchemaRDDs which I am now trying to change to a left outer join between 2
DataFrames. I followed the example for left outer join of DataFrame at
You need to use `===`, so that you are constructing a column expression
instead of evaluating the standard scala equality method. Calling methods
to access columns (i.e. df.county is only supported in python).
val join_df = df1.join( df2, df1(country) === df2(country),
left_outer)
On Tue, Mar
So,
1. I reduced my -XX:ThreadStackSize to 5m (instead of 10m - default is
1m), which is still OK for my need.
2. I reduced the executor memory to 44GB for a 60GB machine (instead of
49GB).
This seems to have helped. Thanks to Matthew and Sean.
Thomas
On Tue, Mar 24, 2015 at 3:49 PM, Matt
thanks Sean and Akhil,
I changed the the permission of */user/spark/applicationHistory, *now it
works,
On Tue, Mar 24, 2015 at 7:35 PM, Sachin Singh sachin.sha...@gmail.com
wrote:
thanks Sean,
please can you suggest in which file or configuration I need to modify
proper path, please
The error you're seeing typically means that you cannot connect to the Hive
metastore itself. Some quick thoughts:
- If you were to run show tables (instead of the CREATE TABLE statement),
are you still getting the same error?
- To confirm, the Hive metastore (MySQL database) is up and running
Hi Denny,
Still facing the same issue.Please find the following errors.
*scala val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)*
*sqlContext: org.apache.spark.sql.hive.HiveContext =
org.apache.spark.sql.hive.HiveContext@4e4f880c*
*scala sqlContext.sql(CREATE TABLE IF NOT EXISTS
No I am just running ./spark-shell command in terminal I will try with
above command
On Wed, Mar 25, 2015 at 11:09 AM, Denny Lee denny.g@gmail.com wrote:
Did you include the connection to a MySQL connector jar so that way
spark-shell / hive can connect to the metastore?
For example, when
Did you include the connection to a MySQL connector jar so that way
spark-shell / hive can connect to the metastore?
For example, when I run my spark-shell instance in standalone mode, I use:
./spark-shell --master spark://servername:7077 --driver-class-path
/lib/mysql-connector-java-5.1.27.jar
http://spark.apache.org/docs/latest/building-spark.html#packaging-without-hadoop-dependencies-for-yarn
does not list hadoop 2.5 in Hadoop version table table etc.
I assume it is still OK to compile with -Pyarn -Phadoop-2.5 for use with
Hadoop 2.5 (cdh 5.3.2)
Thanks,
Any Ideas on this?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Weird-exception-in-Spark-job-tp22195p22204.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
spark-submit --files /path/to/hive-site.xml
On Tue, Mar 24, 2015 at 10:31 AM, Udit Mehta ume...@groupon.com wrote:
Another question related to this, how can we propagate the hive-site.xml to
all workers when running in the yarn cluster mode?
On Tue, Mar 24, 2015 at 10:09 AM, Marcelo Vanzin
Hi all,
I have been trying out the new dataframe api in 1.3, which looks great by
the way.
I have found an example to define udfs and add them to select operations,
like this:
slen = F.udf(lambda s: len(s), IntegerType())
df.select(df.age, slen(df.name).alias('slen')).collect()
is it possible
The only UDAFs that we support today are those defined using the Hive UDAF
API. Otherwise you'll have to drop into Spark operations. I'd suggest
opening a JIRA.
On Tue, Mar 24, 2015 at 10:49 AM, jamborta jambo...@gmail.com wrote:
Hi all,
I have been trying out the new dataframe api in 1.3,
If you want to do a nonstandard (or uncentered) PCA, you can call
computeSVD on RowMatrix, and look at the resulting 'V' Matrix.
That should match the output of the other two systems.
Reza
On Tue, Mar 24, 2015 at 3:53 AM, Sean Owen so...@cloudera.com wrote:
Those implementations are computing
(Oh sorry, I've only been thinking of TallSkinnySVD)
On Tue, Mar 24, 2015 at 6:36 PM, Reza Zadeh r...@databricks.com wrote:
If you want to do a nonstandard (or uncentered) PCA, you can call
computeSVD on RowMatrix, and look at the resulting 'V' Matrix.
That should match the output of the
I would recommend to upload those jars to HDFS, and use add jars
option in spark-submit with URI from HDFS instead of URI from local
filesystem. Thus, it can avoid the problem of fetching jars from
driver which can be a bottleneck.
Sincerely,
DB Tsai
I think this works in practice, but I don't know that the first block
of the file is guaranteed to be in the first partition? certainly
later down the pipeline that won't be true but presumably this is
happening right after reading the file.
I've always just written some filter that would only
It does neither. If you provide a Hive configuration to Spark,
HiveContext will connect to your metastore server, otherwise it will
create its own metastore in the working directory (IIRC).
On Tue, Mar 24, 2015 at 8:58 AM, nitinkak001 nitinkak...@gmail.com wrote:
I am wondering if HiveContext
I checked and apparently it hasn't be released yet. it will be available
in the upcoming CDH 5.4 release.
-Sandy
On Mon, Mar 23, 2015 at 1:32 PM, Nitin kak nitinkak...@gmail.com wrote:
I know there was an effort for this, do you know which version of Cloudera
distribution we could find that?
I am facing the same issue as listed here:
http://apache-spark-user-list.1001560.n3.nabble.com/Packaging-a-spark-job-using-maven-td5615.html
Solution mentioned is here:
https://gist.github.com/prb/d776a47bd164f704eecb
However, I think I don't understand few things:
1) Why are jars being split
Hi all,
We have a cloud application, to which we are adding a reporting service.
For this we have narrowed down to use Cassandra + Spark for data store
and processing respectively.
Since cloud application is separate from Cassandra + Spark deployment,
what is ideal method to interact with Spark
Good point. There's no guarantee that you'll get the actual first
partition. One reason why I wouldn't allow a CSV header line in a real data
file, if I could avoid it.
Back to Spark, a safer approach is RDD.foreachPartition, which takes a
function expecting an iterator. You'll only need to grab
From the exception it seems like your app is also repackaging Scala
classes somehow. Can you double check that and remove the Scala
classes from your app if they're there?
On Mon, Mar 23, 2015 at 10:07 PM, Alexey Zinoviev
alexey.zinov...@gmail.com wrote:
Thanks Marcelo, this options solved the
I am reading about combinebyKey and going through below example from one of
the blog post but i cant understand how it works step by step , Can some one
please explain
Case class Fruit ( kind : String , weight : Int ) {
def makeJuice : Juice = Juice ( weight * 100 )
}
Case
My question wrt Java/Scala was related to extending the classes to support
new custom data sources, so was wondering if those could be written in
Java, since our company is a Java shop.
Yes, you should be able to extend the required interfaces using Java.
The additional push downs I am
Can my new book, Spark GraphX In Action, which is currently in MEAP
http://manning.com/malak/, be added to
https://spark.apache.org/documentation.html and, if appropriate, to
https://spark.apache.org/graphx/ ?
Michael Malak
-
Zhan specifying port fixed the port issue.
Is it possible to specify the log directory while starting the spark
thriftserver?
Still getting this error even through the folder exists and everyone has
permission to use that directory.
drwxr-xr-x 2 root root 4096 Mar 24 19:04
I get following message for each time I run spark job
1. 15/03/24 15:35:56 WARN AbstractLifeCycle: FAILED
SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address
already in use
full trace is here
http://pastebin.com/xSvRN01f
how do I fix this ?
I am on CDH 5.3.1
I'll caution that the UDTs are not a stable public interface yet. We'd
like to do this someday, but currently this feature is mostly for MLlib as
we have not finalized the API.
Having an ordering could be useful, but I'll add that currently UDTs
actually exist in serialized from so the ordering
Hi,
We are observing a hung spark application when one of the yarn datanode
(running multiple spark executors) go down.
Setup details:
* Spark: 1.2.1
* Hadoop: 2.4.0
* Spark Application Mode: yarn-client
* 2 datanodes (DN1, DN2)
* 6 spark executors (initially 3 executors on
You can indeed override the Hadoop configuration at a per-RDD level -
though it is a little more verbose, as in the below example, and you need
to effectively make a copy of the hadoop Configuration:
val thisRDDConf = new Configuration(sc.hadoopConfiguration)
Hey all,
Currently looking into UDTs and I was wondering if it is reasonable to add
the ability to define an Ordering (or if this is possible, then how)?
Currently it will throw an error when non-Native types are used.
Thanks!
-Pat
Hello,
I am seeing various crashes in spark on large jobs which all share a
similar exception:
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
I increased nproc (i.e. ulimit -u) 10 fold, but it
You can try to set it in spark-env.sh.
# - SPARK_LOG_DIR Where log files are stored. (Default:
${SPARK_HOME}/logs)
# - SPARK_PID_DIR Where the pid file is stored. (Default: /tmp)
Thanks.
Zhan Zhang
On Mar 24, 2015, at 12:10 PM, Anubhav Agarwal
I am trying to compute PCA using computePrincipalComponents.
I also computed PCA using h2o in R and R's prcomp. The answers I get from
H2o and R's prComp (non h2o) is same when I set the options for H2o as
standardized=FALSE and for r's prcomp as center = false.
How do I make sure that the
has this issue been fixed in spark 1.2:
https://issues.apache.org/jira/browse/SPARK-2624
On Mon, Mar 23, 2015 at 9:19 PM, Udit Mehta ume...@groupon.com wrote:
I am trying to run a simple query to view tables in my hive metastore
using hive context.
I am getting this error:
spark Persistence
By any chance does this thread address look similar:
http://apache-spark-developers-list.1001551.n3.nabble.com/Lost-executor-on-YARN-ALS-iterations-td7916.html
?
On Tue, Mar 24, 2015 at 5:23 AM Harut Martirosyan
harut.martiros...@gmail.com wrote:
What is performance overhead caused by YARN,
Instead of data.zipWithIndex().filter(_._2==0), which will cause Spark to
read the whole file, use data.take(1), which is simpler.
From the Rdd.take documentation, it works by first scanning one partition,
and using the results from that partition to estimate the number of
additional partitions
Thanks All - perhaps I misread the earlier posts as dependencies with
Hadoop version, but the key is also the CDH 5.3.2 (not just Hadoop 2.5 v/s
2.4) etc.
After adding the classPath as Marcelo/Harsh suggested (loading CDH libs
front), I am able to get spark-shell started without invalid container
Hi there,
On Tue, Mar 24, 2015 at 1:40 PM, Manoj Samel manojsamelt...@gmail.com wrote:
When I run any query, it gives java.lang.NoSuchMethodError:
com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;
Are you running a custom-compiled Spark by any chance?
Hi
Does updateStateByKey pass elements to updateFunc (in Seq[V]) in order in which
they appear in the RDD?
My guess is no which means updateFunc needs to be commutative. Am I correct?
I've asked this question before but there were no takers.
Here's the scala docs for updateStateByKey
/**
*
I doubt you're hitting the limit of threads you can spawn, but as you
say, running out of memory that the JVM process is allowed to allocate
since your threads are grabbing stacks 10x bigger than usual. The
thread stacks are 4GB by themselves.
I suppose you can't not up the stack size so much?
Shahab -
This should do the trick until Hao's changes are out:
sqlContext.sql(create temporary function foobar as 'com.myco.FoobarUDAF');
sqlContext.sql(select foobar(some_column) from some_table);
This works without requiring to 'deploy' a JAR with the UDAF in it - just
make sure the UDAF
On Tue, Mar 24, 2015 at 12:57 AM, Ashish Mukherjee
ashish.mukher...@gmail.com wrote:
1. Is the Data Source API stable as of Spark 1.3.0?
It is marked DeveloperApi, but in general we do not plan to change even
these APIs unless there is a very compelling reason to.
2. The Data Source API
Additional notes:
I did not find anything wrong with the number of threads (ps -u USER -L |
wc -l): around 780 on the master and 400 on executors. I am running on 100
r3.2xlarge.
On Tue, Mar 24, 2015 at 12:38 PM, Thomas Gerber thomas.ger...@radius.com
wrote:
Hello,
I am seeing various crashes
Hadoop 2.5 would be referenced as via -Dhadoop-2.5 using the profile
-Phadoop-2.4
Please note earlier in the link the section:
# Apache Hadoop 2.4.X or 2.5.X
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=VERSION -DskipTests clean package
Versions of Hadoop after 2.5.X may or may not work with the
The right invocation is still a bit different:
... -Phadoop-2.4 -Dhadoop.version=2.5.0
hadoop-2.4 == Hadoop 2.4+
On Tue, Mar 24, 2015 at 5:44 PM, Denny Lee denny.g@gmail.com wrote:
Hadoop 2.5 would be referenced as via -Dhadoop-2.5 using the profile
-Phadoop-2.4
Please note earlier in
Another question related to this, how can we propagate the hive-site.xml to
all workers when running in the yarn cluster mode?
On Tue, Mar 24, 2015 at 10:09 AM, Marcelo Vanzin van...@cloudera.com
wrote:
It does neither. If you provide a Hive configuration to Spark,
HiveContext will connect to
is there some setting i am missing:
this is my spark-env.sh
export MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos.so
export SPARK_EXECUTOR_URI=http://100.125.5.93/sparkx.tgz
export SPARK_LOCAL_IP=127.0.0.1
here is what i see on the slave node.
less
Hi guys.
Basically, we had to define a UDF that does that, is there a built in
function that we can use for it?
--
RGRDZ Harut
Hi
You can use functions like year(date),month(date)
Thanks
Arush
On Tue, Mar 24, 2015 at 12:46 PM, Harut Martirosyan
harut.martiros...@gmail.com wrote:
Hi guys.
Basically, we had to define a UDF that does that, is there a built in
function that we can use for it?
--
RGRDZ Harut
--
Hello,
I have some questions related to the Data Sources API -
1. Is the Data Source API stable as of Spark 1.3.0?
2. The Data Source API seems to be available only in Scala. Is there any
plan to make it available for Java too?
3. Are only filters and projections pushed down to the data
Does your application actually fail?
That message just means there's another application listening on that
port. Spark should try to bind to a different one after that and keep
going.
On Tue, Mar 24, 2015 at 12:43 PM, , Roy rp...@njit.edu wrote:
I get following message for each time I run spark
99 matches
Mail list logo