Would Spark can read file from S3 which are Client-Side Encrypted KMS–Managed Customer Master Key (CMK) ?

2018-11-01 Thread mytramesh
I able to read s3 files which are Server-Side Encryption(SSE-KMS). Added KMSId to IAM role and able to read seamlessly . Recently I am receiving S3 files which are Client-Side Encrypted ( AWS KMS–Managed Customer Master Key (CMK)) , when I try to read these files i am seeing count is 0. To

Re: How to parallelize zip file processing?

2018-08-13 Thread mytramesh
Thanks for your reply. DataSet I am receiving from MainFrames system which I don't have control . Tried below things to move data to other executors but not succeeded 1. Called repartition method, data got re-partitioned but on same executor. Only one core is processing all these

How to parallelize zip file processing?

2018-08-10 Thread mytramesh
I know, spark doesn’t support zip file directly since it not distributable. Any techniques to process this file quickly? I am trying to process around 4GB zip file. All data is moving one executor, and only one task is getting assigned to process all the data. Even when I run repartition

Re: Implementing .zip file codec

2018-08-09 Thread mytramesh
Spark doesn't support zip file reading directly since this not distributable file . Read using Java.uti.zipInputStream api and prepare rdd .. ( 4GB Limit ) import java.util.zip.ZipInputStream import scala.io.Source import org.apache.spark.input.PortableDataStream var zipPath = "s3://

how to specify external jars in program with SparkConf

2018-07-12 Thread mytramesh
Context :- In EMR class path has old version of jar, want to refer new version of jar in my code. through bootstrap while spinning new nodes , copied necessary jars to local folder from S3. In spark-submit command by using extra class path parameter my code able refer new version jar which is