Hi,
Is there any plan to remove the limitation mentioned below?
*Streaming aggregation doesn't support group aggregate pandas UDF *
We want to run our data modelling jobs real time using Spark 3.0 and kafka
2.4 and need to have support for custom aggregate pandas UDF on stream
windows.
Is there
Hi, anyone knows the behavior of dropping managed tables in case of
external hive meta store:
Deletion of the data (e.g. from object store) happens from Spark sql or,
the external hive metastore ?
Confused by local mode and remote mode codes.
> also can improve the existing CBO and make it more general. The paper of
> Spark SQL was published 5 years ago. A lot of great contributions were made
> in the past 5 years.
>
> Cheers,
>
> Xiao
>
> Debajyoti Roy 于2020年1月15日周三 上午9:23写道:
>
>> Thanks all, and Matei
Thanks all, and Matei.
TL;DR of the conclusion for my particular case:
Qualitatively, while Catalyst[1] tries to mitigate learning curve and
maintenance burden, it lacks the dynamic programming approach used by
Calcite[2] and risks falling into local minima.
Quantitatively, there is no
-as-of-join-of-two-datasets-in-apache-spark
2. Snapshot of state with time to state with effective start and end
time:
https://stackoverflow.com/questions/53928372/given-dataset-of-state-snapshots-at-time-t-how-to-transform-it-into-dataset-with/53928400#53928400
Thanks in advance!
Roy
The problem statement and an approach to solve it using windows is
described here:
https://stackoverflow.com/questions/52509498/given-events-with-start-and-end-times-how-to-count-the-number-of-simultaneous-e
Looking for more elegant/performant solutions, if they exist. TIA !
one know is this is even possible ?
Thanks...
Roy
Hi we are using CDH 5.4.0 with Spark 1.5.2 (doesn't come with CDH 5.4.0)
I am following this link
https://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html to
trying to test/create new algorithm with mahout item-similarity.
I am running following command
./bin/mahout
I want to load hbase table into spark.
JavaPairRDD hBaseRDD =
sc.newAPIHadoopRDD(conf, TableInputFormat.class,
ImmutableBytesWritable.class, Result.class);
*when call hBaseRDD.count(),got error.*
Caused by: java.lang.IllegalStateException: The input format
Hi,
We have python2.6 (default) on cluster and also we have installed
python2.7.
I was looking a way to set python version in spark-submit.
anyone know how to do this ?
Thanks
--
View this message in context:
Hi,
We are running Spark 1.3 on CDH 5.4.1 on top of YARN. we want to know how
do we control task timeout when node fails and task running on it should be
restarted on another node. at present job wait for approximately 10 min to
restart the task were running on failed node.
Hi,
Is there any way to make spark driver to run in side YARN containers
rather than gateway/client machine.
At present even with config parameters --master yarn & --deploy-mode
cluster driver runs on gateway/client machine.
We are on CDH 5.4.1 with YARN and Spark 1.3
any help on this ?
we tried --master yarn-client with no different result.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/The-auxService-spark-shuffle-does-not-exist-tp23662p23689.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
I am getting following error for simple spark job
I am running following command
/spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode
cluster --master yarn
/opt/cloudera/parcels/CDH/lib/spark/lib/spark-examples-1.2.0-cdh5.3.1-hadoop2.5.0-cdh5.3.1.jar/
but job doesn't show any
Hi,
Is there a way to get Yarn application ID inside spark application, when
running spark Job on YARN ?
Thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Yarn-application-ID-for-Spark-job-on-Yarn-tp23429.html
Sent from the Apache Spark User List
Hi,
Our spark job on yarn suddenly started failing silently without showing
any error following is the trace.
Using properties file: /usr/lib/spark/conf/spark-defaults.conf
Adding default property:
spark.serializer=org.apache.spark.serializer.KryoSerializer
Adding default property:
Hi,
suddenly our spark job on yarn started failing silently without showing
any error, following is the trace in verbose mode
Using properties file: /usr/lib/spark/conf/spark-defaults.conf
Adding default property:
spark.serializer=org.apache.spark.serializer.KryoSerializer
Adding default
hi,
Suddenly spark jobs started failing with following error
Exception in thread main java.io.FileNotFoundException:
/user/spark/applicationHistory/application_1432824195832_1275.inprogress (No
such file or directory)
full trace here
[21:50:04 x...@hadoop-client01.dev:~]$ spark-submit --class
This got resolved after cleaning /user/spark/applicationHistory/*
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-HistoryServer-not-coming-up-tp22975p22981.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Hi,
After restarting Spark HistoryServer, it failed to come up, I checked
logs for Spark HistoryServer found following messages :'
2015-05-21 11:38:03,790 WARN org.apache.spark.scheduler.ReplayListenerBus:
Log path provided contains no log files.
2015-05-21 11:38:52,319 INFO
I have a key-value RDD, key is a timestamp (femto-second resolution, so
grouping buys me nothing) and I want to reduce it in the chronological
order.
How do I do that in spark?
I am fine with reducing contiguous sections of the set separately and then
aggregating the resulting objects locally.
Hi,
When we start spark job it start new HTTP server for each new job.
Is it possible to disable HTTP server for each job ?
Thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Possible-to-disable-Spark-HTTP-server-tp22772.html
Sent from the Apache
Hi,
I have recently enable log4j.rootCategory=WARN, console in spark
configuration. but after that spark.logConf=True has becomes ineffective.
So just want to confirm if this is because log4j.rootCategory=WARN ?
Thanks
--
View this message in context:
Hi,
My spark job is failing with following error message
org.apache.spark.shuffle.FetchFailedException:
/mnt/ephemeral12/yarn/nm/usercache/abc/appcache/application_1429353954024_1691/spark-local-20150418132335-0723/28/shuffle_3_1_0.index
(No such file or directory)
at
Hi,
How do i get spark job progress-style report on console ?
I tried to set --conf spark.ui.showConsoleProgress=true but it
thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/spark-job-progress-style-report-on-console-tp22440.html
Sent from the
How do I build Spark SQL Avro Library for Spark 1.2 ?
I was following this https://github.com/databricks/spark-avro and was able
to build spark-avro_2.10-1.0.0.jar by simply running sbt/sbt package from
the project root.
but we are on Spark 1.2 and need compatible spark-avro jar.
Any idea how
Hi,
We have cluster running on CDH 5.3.2 and Spark 1.2 (Which is current
version in CDH5.3.2), But We want to try Spark 1.3 without breaking existing
setup, so is it possible to have Spark 1.3 on existing setup ?
Thanks
--
View this message in context:
use zip
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/can-t-union-two-rdds-tp22320p22321.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe,
-project.guava.common.cache.LocalCache$Segment.get(LocalCache.java:2261)
at
org.spark-project.guava.common.cache.LocalCache.get(LocalCache.java:4000)
thanks
On Thu, Mar 26, 2015 at 7:27 PM, , Roy rp...@njit.edu wrote:
We have Spark on YARN, with Cloudera Manager 5.3.2 and CDH 5.3.2
Jobs link
We have Spark on YARN, with Cloudera Manager 5.3.2 and CDH 5.3.2
Jobs link on spark History server doesn't open and shows following message
:
HTTP ERROR: 500
Problem accessing /history/application_1425934191900_87572. Reason:
Server Error
--
*Powered by
do a *netstat -pnat | grep 404* *And see what all processes are
running.
Thanks
Best Regards
On Wed, Mar 25, 2015 at 1:13 AM, , Roy rp...@njit.edu wrote:
I get following message for each time I run spark job
1. 15/03/24 15:35:56 WARN AbstractLifeCycle: FAILED
SelectChannelConnector
thanks
roy
Hi,
I am using CDH 5.3.2 packages installation through Cloudera Manager 5.3.2
I am trying to run one spark job with following command
PYTHONPATH=~/code/utils/ spark-submit --master yarn --executor-memory 3G
--num-executors 30 --driver-memory 2G --executor-cores 2 --name=analytics
[cid:image004.jpg@01D04629.1F451950] [cid:image005.png@01D04629.1F451950]
Hi guys~
Comparing these two architectures, why BDAS put Yarn and Mesos under the HDFS,
do you have any special consideration? Or just easy to express the AMPLab stack?
Best regards!
unsubscribe
35 matches
Mail list logo