Re: --jars from spark-submit on master on YARN don't get added properly to the executors - ClassNotFoundException

2017-08-09 Thread Mikhailau, Alex
y of the jar. On Wed, Aug 9, 2017 at 2:52 PM, Mikhailau, Alex <alex.mikhai...@mlb.com> wrote: > I have log4j json layout jars added via spark-submit on EMR > > > > /usr/lib/spark/bin/spark-submit --deploy-mode cluster --master yarn --jars > /

DStream Spark 2.1.1 Streaming on EMR at scale - long running job fails after two hours

2017-07-26 Thread Mikhailau, Alex
Guys, I am trying hard to make a DStream API Spark streaming job work on EMR. I’ve succeeded to the point of running it for a few hours with eventual failure which is when I start seeing some out of memory exception via “yarn logs” aggregate. I am doing a JSON map and extraction of some

--jars from spark-submit on master on YARN don't get added properly to the executors - ClassNotFoundException

2017-08-09 Thread Mikhailau, Alex
I have log4j json layout jars added via spark-submit on EMR /usr/lib/spark/bin/spark-submit --deploy-mode cluster --master yarn --jars /home/hadoop/lib/jsonevent-layout-1.7.jar,/home/hadoop/lib/json-smart-1.1.1.jar --driver-java-options "-XX:+AlwaysPreTouch -XX:MaxPermSize=6G" --class

Re-sharded kinesis stream starts generating warnings after kinesis shard numbers were doubled

2017-09-13 Thread Mikhailau, Alex
Has anyone seen the following warnings in the log after a kinesis stream has been re-sharded? com.amazonaws.services.kinesis.clientlibrary.lib.worker.ProcessTask WARN Cannot get the shard for this ProcessTask, so duplicate KPL user records in the event of resharding will not be dropped during

Re: Re-sharded kinesis stream starts generating warnings after kinesis shard numbers were doubled

2017-10-04 Thread Mikhailau, Alex
Filed SPARK-22200 From: "Mikhailau, Alex" <alex.mikhai...@mlb.com> Date: Wednesday, October 4, 2017 at 10:43 AM To: "user@spark.apache.org" <user@spark.apache.org> Subject: Re: Re-sharded kinesis stream starts generating warnings after kinesis shard numbers w

Re: Re-sharded kinesis stream starts generating warnings after kinesis shard numbers were doubled

2017-10-04 Thread Mikhailau, Alex
-4454 With 2.2.0 -Alex From: "Mikhailau, Alex" <alex.mikhai...@mlb.com> Date: Wednesday, September 13, 2017 at 4:16 PM To: "user@spark.apache.org" <user@spark.apache.org> Subject: Re-sharded kinesis stream starts generating warnings after kinesis shard numbe

Re: Referencing YARN application id, YARN container hostname, Executor ID and YARN attempt for jobs running on Spark EMR 5.7.0 in log statements?

2017-08-29 Thread Mikhailau, Alex
cheduler@:38151 --executor-id 3 --hostname ip-1… And it gets printed out in the container log: > 17/08/29 13:02:00 INFO Executor: Starting executor ID 3 on host … On Mon, Aug 28, 2017 at 5:41 PM, Mikhailau, Alex <alex.mikhai...@mlb.com<mailto:alex.mikhai...@mlb.com>> wrote: Thanks, Vadim. Th

Spark 2.1.1 with Kinesis Receivers is failing to launch 50 active receivers with oversized cluster on EMR Yarn

2017-09-05 Thread Mikhailau, Alex
Guys, I have a Spark 2.1.1 job with Kinesis where it is failing to launch 50 active receivers with oversized cluster on EMR Yarn. It registers sometimes 16, sometimes 32, other times 48 receivers but not all 50. Any help would be greatly appreciated. Kinesis stream shards = 500 YARN EMR

spark metrics prefix in Graphite is duplicated

2017-09-06 Thread Mikhailau, Alex
Hi guys, When I set up my EMR cluster with Spark I add "*.sink.graphite.prefix": "$env.$namespace.$team.$app" to metrics.properties The cluster comes up with correct metrics.properties Then I simply add-step to EMR with spark-submit without any metrics namespace parameter. In my Graphite,

How do I create a JIRA issue and associate it with a PR that I created for a bug in master?

2017-09-12 Thread Mikhailau, Alex
How do I create a JIRA issue and associate it with a PR that I created for a bug in master? https://github.com/apache/spark/pull/19210

Referencing YARN application id, YARN container hostname, Executor ID and YARN attempt for jobs running on Spark EMR 5.7.0 in log statements?

2017-08-28 Thread Mikhailau, Alex
Does anyone have a working solution for logging YARN application id, YARN container hostname, Executor ID and YARN attempt for jobs running on Spark EMR 5.7.0 in log statements? Are there specific ENV variables available or other workflow for doing that? Thank you Alex

Re: Referencing YARN application id, YARN container hostname, Executor ID and YARN attempt for jobs running on Spark EMR 5.7.0 in log statements?

2017-08-28 Thread Mikhailau, Alex
. Is there an MDC way with spark or something other than to achieve this? Alex From: Vadim Semenov <vadim.seme...@datadoghq.com> Date: Monday, August 28, 2017 at 5:18 PM To: "Mikhailau, Alex" <alex.mikhai...@mlb.com> Cc: "user@spark.apache.org" <user@spark.apache.org> Sub

Cloudwatch metrics sink problem

2017-08-31 Thread Mikhailau, Alex
I am getting the following in the logs: Sink class org.apache.spark.metrics.sink.CloudwatchSink cannot be instantiated due to CloudwatchSink ClassNotFoundException. I am running this on EMR 5.7.0. Does anyone have experience adding this sink to an EMR cluster? Thanks, Alex

does Kinesis Connector for structured streaming auto-scales receivers if a cluster is using dynamic allocation and auto-scaling?

2018-02-01 Thread Mikhailau, Alex
does Kinesis Connector for structured streaming auto-scales receivers if a cluster is using dynamic allocation and auto-scaling?