[
https://issues.apache.org/jira/browse/SPARK-21702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131537#comment-16131537
]
George Pongracz edited comment on SPARK-21702 at 8/18/17 12:54 AM:
-------------------------------------------------------------------
*Update:*
The data bearing files (files that contain the data payload from the stream)
written to s3 when viewed through the AWS S3 GUI and selected using their LHS
check-box encryption in the properties section report their encryption as "-".
Data bearing files if written without using "PartitionBy", when viewed through
the AWS S3 GUI and selected using their LHS check-box encryption in the
properties section report their encryption as "AES-256".
All related non-data bearing files, irrespective whether "PartitionBy" has been
or not been used when selected using their LHS check-box encryption in the
properties section report their encryption as "AES-256".
When clicking through the name of a single data bearing file, when
"PartitionBy" has been used, brings up a dedicated overview screen for the
file, reports it as having AES-256 encryption, which differs from how its
reported with encryption "-" in the parent screen and selected using its LHS
check-box.
As one can see, this labelling of encryption is inconsistent and can cause
confusion that a file on first inspection seems unencrypted, whilst really the
files on deeper via click-through report as encrypted.
I think this lowers the weight of this issue and I can close if deemed a non
issue, however it would be good if the files would written would all present
consistently and correctly, whether data or non-data bearing.
I must say I spun my wheels for a time believing I had not encrypted and trying
to debug until I stumbled upon what I just described in this update.
was (Author: gpongracz):
*Update:*
The data bearing files (files that contain the data payload from the stream)
written to s3 when viewed through the AWS S3 GUI and selected using their LHS
check-box encryption in the properties section report their encryption as "-".
All related non-data bearing files when selected using their LHS check-box
encryption in the properties section report their encryption as "AES-256".
When clicking through the name of a single data bearing file, which brings up a
dedicated overview screen for the file, reports it as having AES-256 encryption.
As one can see, this labelling of encryption is inconsistent and can cause
confusion that a file on first inspection seems unencrypted, whilst really the
files on deeper via click-through report as encrypted.
I think this lowers the weight of this issue and I can close if deemed a non
issue, however it would be good if the files would written would all present
consistently and correctly, whether data or non-data bearing.
I must say I lost a bit of time believing I had not encrypted and tried to
debug until I stumbled upon what I just described in this update.
Obviously only happening when PartitionBy is used.
> Structured Streaming S3A SSE Encryption Not Visible through AWS S3 GUI when
> PartitionBy Used
> --------------------------------------------------------------------------------------------
>
> Key: SPARK-21702
> URL: https://issues.apache.org/jira/browse/SPARK-21702
> Project: Spark
> Issue Type: Bug
> Components: Structured Streaming
> Affects Versions: 2.2.0
> Environment: Hadoop 2.7.3: AWS SDK 1.7.4
> Hadoop 2.8.1: AWS SDK 1.10.6
> Reporter: George Pongracz
> Priority: Minor
> Labels: security
>
> Settings:
> .config("spark.hadoop.fs.s3a.impl",
> "org.apache.hadoop.fs.s3a.S3AFileSystem")
> .config("spark.hadoop.fs.s3a.server-side-encryption-algorithm",
> "AES256")
> When writing to an S3 sink from structured streaming the files are being
> encrypted using AES-256
> When introducing a "PartitionBy" the output data files are unencrypted.
> All other supporting files, metadata are encrypted
> Suspect write to temp is encrypted and move/rename is not applying the SSE.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]