date:20170831

running dockerized spark applications in DC/OS

2017-08-31 Thread sujeet jog

Folks, Does any body have production experience in running dockerized spark application on DC/OS, and can the spark cluster run other than spark stand alone mode ?.. What are the major differences between running spark with Mesos Cluster manager Vs running Spark as dockerized container under

Re: Checkpointing With NOT Serializable

2017-08-31 Thread ayan guha

Any help on this? On Thu, Aug 31, 2017 at 10:30 AM, ayan guha wrote: > Hi > > Want to understand a basic issue. Here is my code: > > def createStreamingContext(sparkCheckpointDir: String,batchDuration: Int > ) = { > > val ssc = new StreamingContext(spark.sparkContext,

[SPARK-SQL] Spark Persist slower than non-persist calls

2017-08-31 Thread saurabh raval

Spark 2.1 My settings are: Running Spark 2.1 on 3 node YARN cluster with 160 GB. Dynamic allocation turned on. spark.executor.memory=6G, spark.executor.cores=6 First, I am reading hive tables: orders(329MB) and lineitems(1.43GB) and doing left outer join.Next, I apply 7 different filter

Cloudwatch metrics sink problem

2017-08-31 Thread Mikhailau, Alex

I am getting the following in the logs: Sink class org.apache.spark.metrics.sink.CloudwatchSink cannot be instantiated due to CloudwatchSink ClassNotFoundException. I am running this on EMR 5.7.0. Does anyone have experience adding this sink to an EMR cluster? Thanks, Alex

Re: [Structured Streaming]Data processing and output trigger should be decoupled

2017-08-31 Thread 张万新

I think something like state store can be used to keep the intermediate data. For aggregations the engines keeps processing batches of data and update the results in state store(or sth like this), and when a trigger begins the engines just fetch the current result from state store and output it to

Inconsistent results with combineByKey API

2017-08-31 Thread Swapnil Shinde

Hello All I am observing some strange results with aggregateByKey API which is implemented with combineByKey. Not sure if this is by design or bug - I created this toy example but same problem can be observed on large datasets as well - *case class ABC(key: Int, c1: Int, c2: Int)* *case class

Delay in Initial Execution Time in Spark for Benchmarking Parquet

2017-08-31 Thread austinldv

Hello, We are running few Star Schema Benchmarks on parquet performance in Spark. We have set up the below functions (shown at the bottom) for getting runtimes. This is a simplified version of spark's benchmark API. The benchmarks are called with 1 warmup and 10 runs. SSB link:

Re: Issue with Spark Twitter Streaming

2017-08-31 Thread grumi

hey, did you manage to solve the problem? I have exactly the same problem and I am not able to solve it. Thank you! -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail:

Re: Failing jobs with Spark 2.2 running on Yarn with HDFS

2017-08-31 Thread Jan-Hendrik Zab

This has indeed been caused by the network backend that dropped several outgoing packets. I'm not sure why this wasn't "caught" by TCP. We ended up with setting send_queue_size=256 recv_queue_size=512 for ib_ipoib and krcvqs=4 fpr hfi1. We also updated our OmniPath switch firmware to the current

[Structured Streaming]Usage of watermark

2017-08-31 Thread KevinZwx

Hi, I'm a little confused about the usage of watermark in SS. According to the guideline, when we use a window-based grouping, SS will automatically handle the late event and we should use watermark to limit the state like this(specify a watermark before groupBy): val words = ... // streaming

Re: Why do checkpoints work the way they do?

2017-08-31 Thread 张万新

So is there any documents demonstrating in what condition can my application recover from the same checkpoint and in what condition not? Tathagata Das 于2017年8月30日周三下午1:20写道： > Hello, > > This is an unfortunate design on my part when I was building DStreams :) > >

running dockerized spark applications in DC/OS

Re: Checkpointing With NOT Serializable

[SPARK-SQL] Spark Persist slower than non-persist calls

Cloudwatch metrics sink problem

Re: [Structured Streaming]Data processing and output trigger should be decoupled

Inconsistent results with combineByKey API

Delay in Initial Execution Time in Spark for Benchmarking Parquet

Re: Issue with Spark Twitter Streaming

Re: Failing jobs with Spark 2.2 running on Yarn with HDFS

[Structured Streaming]Usage of watermark

Re: Why do checkpoints work the way they do?

11 matches

Site Navigation

Mail list logo

Footer information