[
https://issues.apache.org/jira/browse/HADOOP-16942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17192199#comment-17192199
]
Steve Loughran commented on HADOOP-16942:
-----------------------------------------
one more point, you said: EMR version - 6.0.0
Amazon EMR's connector with S3 is their own closed source client. Please take
it up through your AWS support channel. thanks
> S3A creating folder level delete markers
> ----------------------------------------
>
> Key: HADOOP-16942
> URL: https://issues.apache.org/jira/browse/HADOOP-16942
> Project: Hadoop Common
> Issue Type: Task
> Components: fs/s3
> Affects Versions: 2.8.3, 3.2.1
> Reporter: vijayant soni
> Priority: Major
>
> Using S3A URL scheme while writing out data from Spark to S3 is creating many
> folder level delete markers.
> Writing the same with S3 URL scheme, does not create any delete markers at
> all.
>
> Spark - 2.4.4
> Hadoop - 3.2.1
> EMR version - 6.0.0
> Write Mode - Append
> {code:scala}
> [hadoop@ip-192-0-161-212 ~]$ spark-shell
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
> setLogLevel(newLevel).
> 20/03/27 07:37:19 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive
> is set, falling back to uploading libraries under SPARK_HOME.
> Spark context Web UI available at http://ip-192-0-161-212.ec2.internal:4040
> Spark context available as 'sc' (master = yarn, app id =
> application_1585294390030_0003).
> Spark session available as 'spark'.
> Welcome to
> ____ __
> / __/__ ___ _____/ /__
> _\ \/ _ \/ _ `/ __/ '_/
> /___/ .__/\_,_/_/ /_/\_\ version 2.4.4
> /_/
>
> Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_242)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> val df = spark.sql("select 1 as a")
> df: org.apache.spark.sql.DataFrame = [a: int]
> scala>
> df.write.mode(org.apache.spark.sql.SaveMode.Append).save("s3://my-bucket/tmp/vijayant/test/s3/")
>
>
> scala>
> df.write.mode(org.apache.spark.sql.SaveMode.Append).save("s3a://my-bucket/tmp/vijayant/test/s3a/")
>
>
> scala>
> {code}
> Getting delete markers from `s3` write
> {code:bash}
> aws s3api list-object-versions --bucket my-bucket --prefix
> tmp/vijayant/test/s3/
> {
> "Versions": [
> {
> "LastModified": "2020-03-27T07:38:17.000Z",
> "VersionId": "V06OzeE7j221Tq7keSGj8bveCYyJFIcf",
> "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"",
> "StorageClass": "STANDARD",
> "Key": "tmp/vijayant/test/s3/_SUCCESS",
> "Owner": {
> "DisplayName": "sysops+stage",
> "ID":
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
> },
> "IsLatest": true,
> "Size": 0
> },
> {
> "LastModified": "2020-03-27T07:38:16.000Z",
> "VersionId": "dLYtHDugLhFIdw2YHLFmoFOxXkm.21Wo",
> "ETag": "\"26e70a1e26c709e3e8498acd49cfaaa3-1\"",
> "StorageClass": "STANDARD",
> "Key":
> "tmp/vijayant/test/s3/part-00000-9d9a8925-f119-415d-b547-b742396e2ca7-c000.snappy.parquet",
>
> "Owner": {
> "DisplayName": "sysops+stage",
> "ID":
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
> },
> "IsLatest": true,
> "Size": 384
> }
> ]
> }
> {code}
> Getting delete markers from `s3a` write
> {code:bash}
> aws s3api list-object-versions --bucket my-bucket --prefix
> tmp/vijayant/test/s3a/
> {
> "DeleteMarkers": [
> {
> "Owner": {
> "DisplayName": "sysops+stage",
> "ID":
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
> },
> "IsLatest": true,
> "VersionId": "NJWRZMcb_eYYwCJh_isX4H1Ox6W362Wb",
> "Key": "tmp/vijayant/test/s3a/",
> "LastModified": "2020-03-27T07:39:11.000Z"
> },
> {
> "Owner": {
> "DisplayName": "sysops+stage",
> "ID":
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
> },
> "IsLatest": false,
> "VersionId": "F0h0mLcVVwkMtcHxd95Hj7BACL4Up_Q9",
> "Key": "tmp/vijayant/test/s3a/",
> "LastModified": "2020-03-27T07:39:10.000Z"
> },
> {
> "Owner": {
> "DisplayName": "sysops+stage",
> "ID":
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
> },
> "IsLatest": false,
> "VersionId": ".sBcE6cXeggekOnSgZ4n7pyCDHnsLERK",
> "Key": "tmp/vijayant/test/s3a/",
> "LastModified": "2020-03-27T07:39:10.000Z"
> },
> {
> "Owner": {
> "DisplayName": "sysops+stage",
> "ID":
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
> },
> "IsLatest": false,
> "VersionId": "nzm39jiUPC4H0ZaS.5Shp0FYPnR8wNf9",
> "Key": "tmp/vijayant/test/s3a/",
> "LastModified": "2020-03-27T07:39:09.000Z"
> },
> {
> "Owner": {
> "DisplayName": "sysops+stage",
> "ID":
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
> },
> "IsLatest": false,
> "VersionId": "BPM65R1HkZngPDYtDL3zPZYPw_G_m9Ic",
> "Key": "tmp/vijayant/test/s3a/",
> "LastModified": "2020-03-27T07:39:08.000Z"
> },
> {
> "Owner": {
> "DisplayName": "sysops+stage",
> "ID":
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
> },
> "IsLatest": true,
> "VersionId": "LJt8_MVDOiD4UdgUqEMycxjvtinJlTNt",
> "Key": "tmp/vijayant/test/s3a/_temporary/",
> "LastModified": "2020-03-27T07:39:11.000Z"
> },
> {
> "Owner": {
> "DisplayName": "sysops+stage",
> "ID":
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
> },
> "IsLatest": false,
> "VersionId": "RqunJTn8Od0PgFR4yu44PX4kL54k6EDv",
> "Key": "tmp/vijayant/test/s3a/_temporary/",
> "LastModified": "2020-03-27T07:39:09.000Z"
> },
> {
> "Owner": {
> "DisplayName": "sysops+stage",
> "ID":
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
> },
> "IsLatest": false,
> "VersionId": "4vY8cnqUI5VJAk3VfEt_VD_KEczo3bmY",
> "Key": "tmp/vijayant/test/s3a/_temporary/",
> "LastModified": "2020-03-27T07:39:08.000Z"
> },
> {
> "Owner": {
> "DisplayName": "sysops+stage",
> "ID":
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
> },
> "IsLatest": true,
> "VersionId": "ln47YYy.yiE.k70cvqvfgYCEQoYFnKQW",
> "Key": "tmp/vijayant/test/s3a/_temporary/0/",
> "LastModified": "2020-03-27T07:39:11.000Z"
> },
> {
> "Owner": {
> "DisplayName": "sysops+stage",
> "ID":
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
> },
> "IsLatest": false,
> "VersionId": "5Bsrt7s1caM90mzGNgk0MsTU9q8UjTTA",
> "Key": "tmp/vijayant/test/s3a/_temporary/0/",
> "LastModified": "2020-03-27T07:39:09.000Z"
> },
> {
> "Owner": {
> "DisplayName": "sysops+stage",
> "ID":
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
> },
> "IsLatest": true,
> "VersionId": "pN3HzDfnmqIqrMwAL2jqKEBkvoHZALor",
> "Key": "tmp/vijayant/test/s3a/_temporary/0/_temporary/",
> "LastModified": "2020-03-27T07:39:11.000Z"
> },
> {
> "Owner": {
> "DisplayName": "sysops+stage",
> "ID":
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
> },
> "IsLatest": false,
> "VersionId": "wg91poO1KXReXxvsZHzZXrHR1IgIX8t2",
> "Key": "tmp/vijayant/test/s3a/_temporary/0/_temporary/",
> "LastModified": "2020-03-27T07:39:09.000Z"
> },
> {
> "Owner": {
> "DisplayName": "sysops+stage",
> "ID":
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
> },
> "IsLatest": true,
> "VersionId": "cv5Noykq3sMilQqJXAH3E.N7qAWnIBx7",
> "Key":
> "tmp/vijayant/test/s3a/_temporary/0/_temporary/attempt_20200327073907_0001_m_000000_1/",
>
> "LastModified": "2020-03-27T07:39:11.000Z"
> },
> {
> "Owner": {
> "DisplayName": "sysops+stage",
> "ID":
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
> },
> "IsLatest": false,
> "VersionId": "6xzt9SxlCUJaOLD8krkE3yXfQU14rErX",
> "Key":
> "tmp/vijayant/test/s3a/_temporary/0/_temporary/attempt_20200327073907_0001_m_000000_1/",
>
> "LastModified": "2020-03-27T07:39:09.000Z"
> },
> {
> "Owner": {
> "DisplayName": "sysops+stage",
> "ID":
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
> },
> "IsLatest": true,
> "VersionId": "wGmJAo7x_gkLWAiHzxPGdPMVSus7Wcp1",
> "Key":
> "tmp/vijayant/test/s3a/_temporary/0/_temporary/attempt_20200327073907_0001_m_000000_1/part-00000-3923e1b1-406c-4202-b9a8-3bd7cb2d97b2-c000.snappy.parquet",
>
> "LastModified": "2020-03-27T07:39:10.000Z"
> }
> ],
> "Versions": [
> {
> "LastModified": "2020-03-27T07:39:11.000Z",
> "VersionId": "2py_ZXKl7yh6fwhzksAx8Os1BriDJCBb",
> "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"",
> "StorageClass": "STANDARD",
> "Key": "tmp/vijayant/test/s3a/_SUCCESS",
> "Owner": {
> "DisplayName": "sysops+stage",
> "ID":
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
> },
> "IsLatest": true,
> "Size": 0
> },
> {
> "LastModified": "2020-03-27T07:39:08.000Z",
> "VersionId": "lDqTnLCqDYtjrOiY.V7E6AKTRQLKrqUT",
> "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"",
> "StorageClass": "STANDARD",
> "Key": "tmp/vijayant/test/s3a/_temporary/0/",
> "Owner": {
> "DisplayName": "sysops+stage",
> "ID":
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
> },
> "IsLatest": false,
> "Size": 0
> },
> {
> "LastModified": "2020-03-27T07:39:10.000Z",
> "VersionId": "g.rGoTDdmrGrNjrLchvwz3jMmGePkgiD",
> "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"",
> "StorageClass": "STANDARD",
> "Key":
> "tmp/vijayant/test/s3a/_temporary/0/_temporary/attempt_20200327073907_0001_m_000000_1/",
>
> "Owner": {
> "DisplayName": "sysops+stage",
> "ID":
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
> },
> "IsLatest": false,
> "Size": 0
> },
> {
> "LastModified": "2020-03-27T07:39:09.000Z",
> "VersionId": ".ZCpY2UW4hRlbLL87dFUJRuk021Hyq8p",
> "ETag": "\"3def7238a0858c17c62d7045290175cf\"",
> "StorageClass": "STANDARD",
> "Key":
> "tmp/vijayant/test/s3a/_temporary/0/_temporary/attempt_20200327073907_0001_m_000000_1/part-00000-3923e1b1-406c-4202-b9a8-3bd7cb2d97b2-c000.snappy.parquet",
>
> "Owner": {
> "DisplayName": "sysops+stage",
> "ID":
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
> },
> "IsLatest": false,
> "Size": 384
> },
> {
> "LastModified": "2020-03-27T07:39:10.000Z",
> "VersionId": "JSNjTDHSQqe9zSAV93bc6TXPuqA.vDJE",
> "ETag": "\"3def7238a0858c17c62d7045290175cf\"",
> "StorageClass": "STANDARD",
> "Key":
> "tmp/vijayant/test/s3a/part-00000-3923e1b1-406c-4202-b9a8-3bd7cb2d97b2-c000.snappy.parquet",
>
> "Owner": {
> "DisplayName": "sysops+stage",
> "ID":
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
> },
> "IsLatest": true,
> "Size": 384
> }
> ]
> }
> {code}
> This in turn makes listing objects slow and we have even noticed timeouts due
> to too many delete markers.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]