Hi,
Could someone recommend the monitoring tools for spark streaming?
By extending StreamingListener we can dump the delay in processing of
batches and some alert messages.
But are there any Web UI tools where we can monitor failures, see delays in
processing, error messages and setup alerts
r$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/09/20 22:39:10 WARN TaskSetManager: Lost task 0.0 in stage 14.0 (TID 16,
localhost): java.lang.RuntimeException: hbase-default.xml file seems to be
for and old version of HBase (null), this version is
0.98.4.2.2.4.2-2-hadoop2
Thanks,
Siva.
Hi Everyone,
Observing a strange problem while submitting spark streaming job in
yarn-cluster mode through spark-submit. All the executors are using only 1
Vcore irrespective value of the parameter --executor-cores.
Are there any config parameters overrides --executor-cores value?
Thanks,
Hi Kalpseh,
Just to add, you could use "yarn logs -applicationId " to
see aggregated logs once application is finished.
Thanks,
Sivakumar Bhavanari.
On Mon, Dec 21, 2015 at 3:56 PM, Zhan Zhang wrote:
> Hi Kalpesh,
>
> If you are using spark on yarn, it may not work.
Thanks a lot Saisai and Zhan, I see DefaultResourceCalculator currently
being used for Capacity scheduler. We will change it to
DominantResourceCalculator.
Thanks,
Sivakumar Bhavanari.
On Mon, Dec 21, 2015 at 5:56 PM, Zhan Zhang wrote:
> BTW: It is not only a Yarn-webui
, Saisai Shao <sai.sai.s...@gmail.com> wrote:
> Hi Siva,
>
> How did you know that --executor-cores is ignored and where did you see
> that only 1 Vcore is allocated?
>
> Thanks
> Saisai
>
> On Tue, Dec 22, 2015 at 9:08 AM, Siva <sbhavan...@gmail.com> wrote:
>
Just try to use "xml" as format like below,
SQLContext sqlContext = new SQLContext(sc);
DataFrame df = sqlContext.read()
.format("xml")
.option("rowTag", "row")
.load("A.xml");
FYR: https://gi
If its not working,
Add the package list while executing spark-submit/spark-shell like below
$SPARK_HOME/bin/spark-shell --packages com.databricks:spark-xml_2.10:0.3.3
$SPARK_HOME/bin/spark-submit --packages com.databricks:spark-xml_2.10:0.3.3
On Fri, Jun 17, 2016 at 2:56 PM, Siva
If you are running from IDE, Are you using Intellij?
On Fri, Jun 17, 2016 at 3:20 PM, Siva A <siva9940261...@gmail.com> wrote:
> Can you try to package as a jar and run using spark-submit
>
> Siva
>
> On Fri, Jun 17, 2016 at 3:17 PM, VG <vlin...@gmail.com> wrote:
>
Can you try to package as a jar and run using spark-submit
Siva
On Fri, Jun 17, 2016 at 3:17 PM, VG <vlin...@gmail.com> wrote:
> I am trying to run from IDE and everything else is working fine.
> I added spark-xml jar and now I ended up into this dependency
>
> 6/0
Try to import the class and see if you are getting compilation error
import com.databricks.spark.xml
Siva
On Fri, Jun 17, 2016 at 4:02 PM, VG <vlin...@gmail.com> wrote:
> nopes. eclipse.
>
>
> On Fri, Jun 17, 2016 at 3:58 PM, Siva A <siva9940261...@gmail.com> wrote:
Hi Everyone,
Avro data written by dataframe in hdfs in not able to read by hive. Saving
data avro format with below statement.
df.save("com.databricks.spark.avro", SaveMode.Append, Map("path" -> path))
Created hive avro external table and while reading I see all nulls. Did
anyone face similar
Hi Marco,
I did run in IDE(Intellij) as well. It works fine.
VG, make sure the right jar is in classpath.
--Siva
On Fri, Jun 17, 2016 at 4:11 PM, Marco Mistroni <mmistr...@gmail.com> wrote:
> and your eclipse path is correct?
> i suggest, as Siva did before, to build your jar an
Use Spark XML version,0.3.3
com.databricks
spark-xml_2.10
0.3.3
On Fri, Jun 17, 2016 at 4:25 PM, VG <vlin...@gmail.com> wrote:
> Hi Siva
>
> This is what i have for jars. Did you manage to run with these or
> different versions ?
>
>
>
> org.apache.s
t;
>
> Mohammed
>
> Author: Big Data Analytics with Spark
> <http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>
>
>
>
> *From:* Siva [mailto:sbhavan...@gmail.com]
> *Sent:* Friday, January 29, 2016 5:40 PM
> *To:* Mohammed Guller
>
Hi Everyone,
We are using spark 1.4.1 and we have a requirement of writing data local fs
instead of hdfs.
When trying to save rdd to local fs with saveAsTextFile, it is just writing
_SUCCESS file in the folder with no part- files and also no error or
warning messages on console.
Is there any
r you running Spark on a single machine?
>
>
>
> You can change Spark’s logging level to INFO or DEBUG to see what is going
> on.
>
>
>
> Mohammed
>
> Author: Big Data Analytics with Spark
> <http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/&g
Hi Everyone,
All of sudden we are encountering the below error from one of the spark
consumer. It used to work before without any issues.
When I restart the consumer with latest offsets, it is working fine for
sometime (it executed few batches) and it fails again, this issue is
intermittent.
has been provided to all
> the executors in your cluster. Most of the class not found errors got
> resolved for me after making required jars available in the SparkContext.
>
> Thanks.
>
> From: Ted Yu <yuzhih...@gmail.com>
> Date: Saturday, 12 March 2016 at 7:17 AM
&g
operation using only one task. I couldn't increase the
parallelism.
Thanks in advance
Thanks
Siva
/218816482/?action=detaileventId=218816482
We meet every month in East Bay (Emeryville, CA). I am looking for someone
to give a talk about Spark for the next meetup (Feb 5th)
Let me know if you are interested in giving a talk.
Thanks,
-- Siva Jagadeesan
/218816482/?action=detaileventId=218816482
We meet every month in East Bay (Emeryville, CA). I am looking for someone
to give a talk about Spark for the next meetup (Feb 5th)
Let me know if you are interested in giving a talk.
Thanks,
-- Siva Jagadeesan
http://www.meetup.com/Bay-Area-Stream-Processing/events/219086133/
Thursday, June 4, 2015
6:45 PM
TubeMogul
http://maps.google.com/maps?f=qhl=enq=1250+53rd%2C+Emeryville%2C+CA%2C+94608%2C+us
1250 53rd
St #1
Emeryville, CA
6:45PM to 7:00PM - Socializing
7:00PM to 8:00PM - Talks
8:00PM to
.
Thanks
Siva
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Eclipse-IDE-Maven-tp23977.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e
I want to program in scala for spark.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Eclipse-IDE-Maven-tp23977p23981.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Ref:https://issues.apache.org/jira/browse/SPARK-11953
In Spark 1.3.1 we have 2 methods i.e.. CreateJdbcTable and InsertIntoJdbc
They are replaced with write.jdbc() in Spark 1.4.1
CreateJDBCTable allows to perform CREATE TABLE ... i.e... DDL on the table
followed by INSERT (DML)
InsertIntoJDBC
Hi,
I am trying to write a dataframe from Spark 1.4.1 to oracle 11g
I am using
dataframe.write.mode(SaveMode.Append).jdbc(url,tablename, properties)
this is always trying to create a Table.
I would like to insert records to an existing table instead of creating a
new one each single time.
e changes will break
> Java serialization.
>
> On Mon, Apr 11, 2016 at 4:30 PM, Siva Gudavalli <gss.su...@gmail.com>
> wrote:
>
>> hello,
>>
>> i am writing a spark streaming application to read data from kafka. I am
>> using no receiver approach and
hello,
i am writing a spark streaming application to read data from kafka. I am
using no receiver approach and enabled checkpointing to make sure I am not
reading messages again in case of failure. (exactly once semantics)
i have a quick question how checkpointing needs to be configured to
Hello,
I have my data stored in parquet file format. My data Is already partitioned by
dates and keyNow I want my data in each file to be sorted by a new Code column.
date1 -> key1
-> paqfile1
->paqfile2
->key2
->paqfile1
->paqfile2
date2
Hello,
I am working with Spark SQL to query Hive Managed Table (in Orc Format)
I have my data organized by partitions and asked to set indexes for each
50,000 Rows by setting ('orc.row.index.stride'='5')
lets say -> after evaluating partition there are around 50 files in which
data is
92 DESC], output=[id#192])
+- ConvertToSafe
+- Project [id#192]
+- Filter (usr#199 = AA0YP)
+- HiveTableScan [id#192,usr#199], MetastoreRelation default, hlogsv5,
None, [(cdt#189 = 20171003),(usrpartkey#191 = hhhUsers)]
please let me know if i am missing anything here. thank you
On Monday,
Hello Asmath,
We had a similar challenge recently.
When you write back to hive, you are creating files on HDFS, and it depends on
your batch window.
If you increase your batch window lets say from 1 min to 5 mins you will end up
creating 5x times less.
The other factor is your partitioning.
You can try with this, it will work
val finaldf = merchantdf.write.
format("org.apache.spark.sql.cassandra")
.mode(SaveMode.Overwrite)
.option("confirm.truncate", true)
.options(Map("table" -> "tablename", "keyspace" -> "keyspace"))
.save()
On Wed 27 Jun,
t it reads
> the file, but it should not read all the content, which is probably also not
> happening.
>
> On 24. Oct 2017, at 18:16, Siva Gudavalli <gudavalli.s...@yahoo.com.INVALID
> <mailto:gudavalli.s...@yahoo.com.INVALID>> wrote:
>
>>
>> Hello,
>>
&
ect statement. If I'm not mistaken, it is known
> as a bit costly since each call would produce a new Dataset. Defining
> schema and using "from_json" will eliminate all the call of withColumn"s"
> and extra calls of "get_json_object".
>
> - Jungtaek
Hello All,
I am using Spark 2.3 version and i am trying to write Spark Streaming Join.
It is a basic join and it is taking more time to join the stream data. I am
not sure any configuration we need to set on Spark.
Code:
*
import org.apache.spark.sql.SparkSession
import
Yes, I am also facing the same issue. Did you figured out?
On Tue, 9 Jul 2019, 7:25 pm Kamalanathan Venkatesan, <
kamalanatha...@in.ey.com> wrote:
> Hello,
>
>
>
> I have below spark structural streaming code and I was expecting the
> results to be printed on the console every 10 seconds. But, I
Hi Team,
Need help on windowing & watermark concept. This code is not working as
expected.
package com.jiomoney.streaming
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
import org.apache.spark.sql.streaming.ProcessingTime
object SlingStreaming {
def
Hi Team,
I have a spark streaming job, which will read from kafka and write into
elastic via Http request.
I want to validate each request from Kafka and change the payload as per
business need and write into Elastic Search.
I have used ES Http Request to push the data into Elastic Search. Can
Hi Jainshasha,
I need to read each row from Dataframe and made some changes to it before
inserting it into ES.
Thanks
Siva
On Mon, Oct 5, 2020 at 8:06 PM jainshasha wrote:
> Hi Siva
>
> To emit data into ES using spark structured streaming job you need to used
> ElasticSearch j
Hi all,
I am using Spark Structured Streaming (Version 2.3.2). I need to read from
Kafka Cluster and write into Kerberized Kafka.
Here I want to use Kafka as offset checkpointing after the record is
written into Kerberized Kafka.
Questions:
1. Can we use Kafka for checkpointing to manage offset
(dfKafkaPayload.select("value").as[String]).schema
But while executing the same via Spark Streaming Job, we cannot do the
above since streaming can have only on Action.
Please let me know.
Thanks
Siva
Monotonically_increasing_id() will give the same functionality
On Mon, 7 Feb, 2022, 6:57 am , wrote:
> For a dataframe object, how to add a column who is auto_increment like
> mysql's behavior?
>
> Thank you.
>
> -
> To
Hi All,
I am new to Spark and running pi example on Yarn Cluster. I am getting the
following exception
Exception in thread main java.lang.NullPointerException
at
scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:114)
at
Hi Team,
I have a spark streaming job which I am running in a single node
cluster. I often see the schedulingTime > Processing Time in streaming
statistics after a few minutes of my application startup. What does that
mean? Should I increase the no:of receivers?
Regards
Taun
Hi Team ,
We are not getting any error when retrieving the data from hive table in
PYSPARK , but getting the error ( Scala.matcherror MATERIALIZED_VIEW ( of
class org.Apache.Hadoop.hive.metastore.TableType ) . Please let me know
resolution for this ?
Thanks
hello, Can I do complex data manipulations inside groupby function.? i.e. I
want to group my whole dataframe by a column and then do some processing for
each group.
The information contained in this message is intended only for the recipient,
and may be a
Hi
Am getting below exception when I Run Spark-submit in linux machine , can
someone give quick solution with commands
Driver stacktrace:
- Job 0 failed: count at DailyGainersAndLosersPublisher.scala:145, took
5.749450 s
org.apache.spark.SparkException: Job aborted due to stage failure: Task 4
Hi,
When I am doing calculations for example 700 listID's it is saving only some 50
rows and then getting some random exceptions
Getting below exception when I try to do calculations on huge data and try to
save huge data . Please let me know if any suggestions.
Sample Code :
I have some
50 matches
Mail list logo