I able to read s3 files which are Server-Side Encryption(SSE-KMS). Added
KMSId to IAM role and able to read seamlessly .
Recently I am receiving S3 files which are Client-Side Encrypted ( AWS
KMS–Managed Customer Master Key (CMK)) , when I try to read these files i
am seeing count is 0.
To
Thanks for your reply. DataSet I am receiving from MainFrames system which I
don't have control .
Tried below things to move data to other executors but not succeeded
1. Called repartition method, data got re-partitioned but on same
executor. Only one core is processing all these
I know, spark doesn’t support zip file directly since it not distributable.
Any techniques to process this file quickly?
I am trying to process around 4GB zip file. All data is moving one executor,
and only one task is getting assigned to process all the data.
Even when I run repartition
Spark doesn't support zip file reading directly since this not distributable
file .
Read using Java.uti.zipInputStream api and prepare rdd .. ( 4GB Limit )
import java.util.zip.ZipInputStream
import scala.io.Source
import org.apache.spark.input.PortableDataStream
var zipPath = "s3://
Context :- In EMR class path has old version of jar, want to refer new
version of jar in my code.
through bootstrap while spinning new nodes , copied necessary jars to local
folder from S3.
In spark-submit command by using extra class path parameter my code able
refer new version jar which is