Hi all,
We are running a Standalone Spark Cluster for running a streaming
application. The application consumes data from Flume using a Flume Polling
stream created as such :
flumeStream = FlumeUtils.createPollingStream(streamingContext,
socketAddress.toArray(new
That is not the write way to use watermark + append output mode. The
`withWatermark` must be before the aggregation. Something like this.
df.withWatermark("timestamp", "1 hour")
.groupBy(window("timestamp", "30 seconds"))
.agg(...)
Read more here -
Hi All,
I am struggling with an odd issue and would like your help in addressing it.
Environment
AWS Cluster (40 Spark Nodes & 4 node Kafka cluster)
Spark Kafka Streaming submitted in Yarn cluster mode
Kafka - Single topic, 400 partitions
Spark 2.1 on Cloudera
Kafka 10.0 on Cloudera
We have zero
Hi ,
I have submitted a JIRA for this issue.
The link is
https://issues.apache.org/jira/browse/SPARK-21147
thanks
Fei Shao
---Original---
From: "Michael Armbrust"
Date: 2017/6/20 03:06:49
To: "??"<1427357...@qq.com>;
Cc:
Unsubscribe
Sent from Yahoo Mail on Android
Hi all,
The code of Column has member named isPartition and isBucket.
What is the meanibg of them please?
And when should set them as true please?
Thank you advanced.
Fei Shao
Hi Burak,
Per your suggestion, I have specified
> deviceBasicAgg.withWatermark("eventtime", "30 seconds");
before invoking deviceBasicAgg.writeStream()...
But I am still facing ~
org.apache.spark.sql.AnalysisException: Append output mode not supported
when there are streaming aggregations on
And perhaps the error message can be improved here?
From: Tathagata Das
Sent: Monday, June 19, 2017 8:24:01 PM
To: kaniska Mandal
Cc: Burak Yavuz; user
Subject: Re: How save streaming aggregations on 'Structured Streams' in parquet
Hi All,
Similar to how we have machine learning library called ML, do we have
anything for deep learning?
If yes, please share the details. If not then what should be the approach?
Thanks and regards,
Gaurav Pandya
--
View this message in context:
Hello to all,
Below is my issue. I have already build again and reimport my project in
IntelliJIDEA, but it still gives me this kind of error. But I can build without
error by Maven. Just the IDEA gives me this error. Is there anyone know what
happened with this ?
Thanks
Minglei
Thanks Tathagata for the pointer.
On Mon, Jun 19, 2017 at 8:24 PM, Tathagata Das
wrote:
> That is not the write way to use watermark + append output mode. The
> `withWatermark` must be before the aggregation. Something like this.
>
> df.withWatermark("timestamp", "1
Hi,
I have spark kafka streaming job running in Yarn cluster mode with
spark.task.maxFailures=4 (default)
spark.yarn.max.executor.failures=8
number of executor=1
spark.streaming.stopGracefullyOnShutdown=false
checkpointing enabled
- When there is RuntimeException in a batch in executor then
Does spark support hive table(parquet) column renaming?
Hi,
we are using structured streaming for stream processing and for each
message to do some data enrichment i have to lookup data from cassandra and
that data in cassandra gets periodically (like once in a day) updated.
I want to look at the option of loading it as a dataset and then register it
Hi Cody - i do have a additional basic question ..
When i tried to compile the code in Eclipse, i was not able to do that
eg.
import org.apache.spark.streaming.kafka.KafkaUtils
gave errors saying KafaUtils was not part of the package.
However, when i used sbt to compile - the compilation went
Hi,
My goal is to ~
(1) either chain streaming aggregations in a single query OR
(2) run multiple streaming aggregations and save data in some meaningful
format to execute low latency / failsafe OLAP queries
So my first choice is parquet format , but I failed to make it work !
I am using
Hello Ashok,
We're consuming from more than 10 topics in some Spark streaming applications.
Topic management is a concern (what is read from where, etc), but I have seen
no issues from Spark itself.
Regards,
Bryan Jeffrey
Get Outlook for Android
On Mon, Jun 19, 2017 at
org.apache.spark.streaming.kafka.KafkaUtils
is in the
spark-streaming-kafka-0-8
project
On Mon, Jun 19, 2017 at 1:01 PM, karan alang wrote:
> Hi Cody - i do have a additional basic question ..
>
> When i tried to compile the code in Eclipse, i was not able to do that
>
>
Hi Gurus,
Within one Spark streaming process how many topics can be handled? I have not
tried more than one topic.
Thanks
I don't think that there is really a Spark specific limit here. It would
be a function of the size of your spark / kafka clusters and the type of
processing you are trying to do.
On Mon, Jun 19, 2017 at 12:00 PM, Ashok Kumar
wrote:
> Hi Gurus,
>
> Within one Spark
Hi Kaniska,
In order to use append mode with aggregations, you need to set an event
time watermark (using `withWatermark`). Otherwise, Spark doesn't know when
to output an aggregation result as "final".
Best,
Burak
On Mon, Jun 19, 2017 at 11:03 AM, kaniska Mandal
The socket source can't know how to parse your data. I think the right
thing would be for it to throw an exception saying that you can't set the
schema here. Would you mind opening a JIRA ticket?
If you are trying to parse data from something like JSON then you should
use from_json` on the
thank you
in the following example
val topics = "test1,test2,test3"
val brokers = "localhost:9092"
val topicsSet = topics.split(",").toSet
val sparkConf = new
SparkConf().setAppName("KafkaDroneCalc").setMaster("local")
//spark://localhost:7077
val sc = new
Hi!
I have a question about logs and have not seen the answer through internet.
I have a spark submit process and I configure a custom log configuration to
it using the next params:
--conf
"spark.executor.extraJavaOptions=-Dlog4j.configuration=customlog4j.properties"
--driver-java-options
Some of projects (such as spark-tags) are Java projects. Spark doesn't fix
the artifact name and just hard-core 2.11.
For your issue, try to use `install` rather than `package`.
On Sat, Jun 17, 2017 at 7:20 PM, Kanagha Kumar
wrote:
> Hi,
>
> Bumping up again! Why does
Thanks. But, I am required to do a maven release to Nexus on spark 2.0.2
built against scala 2.10.
How can I go about with this? Is this a bug that I need to open in Spark
jira?
On Mon, Jun 19, 2017 at 12:12 PM, Shixiong(Ryan) Zhu <
shixi...@databricks.com> wrote:
> Some of projects (such as
Hi,
I am iteratively receiving a file which can only be opened as a Pandas
dataframe. For the first such file I receive, I am converting this to a
Spark dataframe using the 'createDataframe' utility function. The next file
onward, I am converting it and union'ing it into the first Spark
27 matches
Mail list logo