Re: Spark Conf

2018-03-15 Thread Neil Jonkers
Hi "In general, configuration values explicitly set on a SparkConf take the highest precedence, then flags passed to spark-submit, then values in the defaults file." https://spark.apache.org/docs/latest/submitting-applications.html Perhaps this will help Vinyas: Look at args.sparkProperties in

Re: [Spark Core] excessive read/load times on parquet files in 2.2 vs 2.0

2017-09-08 Thread Neil Jonkers
Can you provide a code sample please? On Fri, Sep 8, 2017 at 5:44 PM, Matthew Anthony wrote: > Hi all - > > > since upgrading to 2.2.0, we've noticed a significant increase in > read.parquet(...) ops. The parquet files are being read from S3. Upon entry > at the interactive

Re: Looking at EMR Logs

2017-03-31 Thread Neil Jonkers
Modifying spark.eventLog.dir to point to a S3 path, you will encounter the following exception in Spark history log on path: /var/log/spark/spark-history-server.out Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found

Re: spark 2.02 error when writing to s3

2017-01-20 Thread Neil Jonkers
Can you test by enabling emrfs consistent view and use s3:// uri. http://docs.aws.amazon.com/emr/latest/ManagementGuide/enable-consistent-view.html Original message From: Steve Loughran Date:20/01/2017 21:17 (GMT+02:00) To: "VND Tremblay, Paul"

Re: Running Spark on EMR

2017-01-15 Thread Neil Jonkers
Hello, Can you drop the url:  spark://master:7077 The url is used when running Spark in standalone mode. Regards Original message From: Marco Mistroni Date:15/01/2017 16:34 (GMT+02:00) To: User Subject: Running Spark on EMR

Re: [Spark Core] - Spark dynamoDB integration

2016-12-12 Thread Neil Jonkers
Hello, Good examples on how to interface with DynamoDB from Spark here: https://aws.amazon.com/blogs/big-data/using-spark-sql-for-etl/ https://aws.amazon.com/blogs/big-data/analyze-your-data-on-amazon-dynamodb-with-apache-spark/ Thanks On Mon, Dec 12, 2016 at 7:56 PM, Marco Mistroni

Re: spark 1.4.1 saveAsTextFile is slow on emr-4.0.0

2015-09-02 Thread Neil Jonkers
Hi, Can you set the following parameters in your mapred-site.xml file please: mapred.output.direct.EmrFileSystemtrue mapred.output.direct.NativeS3FileSystemtrue You can also config this at cluster launch time with the following Classification via EMR console: