[ 
https://issues.apache.org/jira/browse/HADOOP-16942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17192199#comment-17192199
 ] 

Steve Loughran commented on HADOOP-16942:
-----------------------------------------

one more point, you said: EMR version - 6.0.0

Amazon EMR's connector with S3 is their own closed source client. Please take 
it up through your AWS support channel. thanks

> S3A creating folder level delete markers
> ----------------------------------------
>
>                 Key: HADOOP-16942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16942
>             Project: Hadoop Common
>          Issue Type: Task
>          Components: fs/s3
>    Affects Versions: 2.8.3, 3.2.1
>            Reporter: vijayant soni
>            Priority: Major
>
> Using S3A URL scheme while writing out data from Spark to S3 is creating many 
> folder level delete markers.
> Writing the same with S3 URL scheme, does not create any delete markers at 
> all.
>  
> Spark - 2.4.4
> Hadoop - 3.2.1
> EMR version - 6.0.0
> Write Mode - Append
> {code:scala}
> [hadoop@ip-192-0-161-212 ~]$ spark-shell
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> 20/03/27 07:37:19 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive 
> is set, falling back to uploading libraries under SPARK_HOME.
> Spark context Web UI available at http://ip-192-0-161-212.ec2.internal:4040
> Spark context available as 'sc' (master = yarn, app id = 
> application_1585294390030_0003).
> Spark session available as 'spark'.
> Welcome to
>       ____              __
>      / __/__  ___ _____/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 2.4.4
>       /_/
>          
> Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_242)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> val df = spark.sql("select 1 as a")
> df: org.apache.spark.sql.DataFrame = [a: int]
> scala> 
> df.write.mode(org.apache.spark.sql.SaveMode.Append).save("s3://my-bucket/tmp/vijayant/test/s3/")
>                                                                               
>   
> scala> 
> df.write.mode(org.apache.spark.sql.SaveMode.Append).save("s3a://my-bucket/tmp/vijayant/test/s3a/")
>                                                                               
>   
> scala> 
> {code}
> Getting delete markers from `s3` write
> {code:bash}
> aws s3api list-object-versions --bucket my-bucket --prefix 
> tmp/vijayant/test/s3/
> {
>     "Versions": [
>         {
>             "LastModified": "2020-03-27T07:38:17.000Z", 
>             "VersionId": "V06OzeE7j221Tq7keSGj8bveCYyJFIcf", 
>             "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"", 
>             "StorageClass": "STANDARD", 
>             "Key": "tmp/vijayant/test/s3/_SUCCESS", 
>             "Owner": {
>                 "DisplayName": "sysops+stage", 
>                 "ID": 
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
>             }, 
>             "IsLatest": true, 
>             "Size": 0
>         }, 
>         {
>             "LastModified": "2020-03-27T07:38:16.000Z", 
>             "VersionId": "dLYtHDugLhFIdw2YHLFmoFOxXkm.21Wo", 
>             "ETag": "\"26e70a1e26c709e3e8498acd49cfaaa3-1\"", 
>             "StorageClass": "STANDARD", 
>             "Key": 
> "tmp/vijayant/test/s3/part-00000-9d9a8925-f119-415d-b547-b742396e2ca7-c000.snappy.parquet",
>  
>             "Owner": {
>                 "DisplayName": "sysops+stage", 
>                 "ID": 
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
>             }, 
>             "IsLatest": true, 
>             "Size": 384
>         }
>     ]
> } 
> {code}
> Getting delete markers from `s3a` write
> {code:bash}
> aws s3api list-object-versions --bucket my-bucket --prefix 
> tmp/vijayant/test/s3a/
> {
>     "DeleteMarkers": [
>         {
>             "Owner": {
>                 "DisplayName": "sysops+stage", 
>                 "ID": 
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
>             }, 
>             "IsLatest": true, 
>             "VersionId": "NJWRZMcb_eYYwCJh_isX4H1Ox6W362Wb", 
>             "Key": "tmp/vijayant/test/s3a/", 
>             "LastModified": "2020-03-27T07:39:11.000Z"
>         }, 
>         {
>             "Owner": {
>                 "DisplayName": "sysops+stage", 
>                 "ID": 
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
>             }, 
>             "IsLatest": false, 
>             "VersionId": "F0h0mLcVVwkMtcHxd95Hj7BACL4Up_Q9", 
>             "Key": "tmp/vijayant/test/s3a/", 
>             "LastModified": "2020-03-27T07:39:10.000Z"
>         }, 
>         {
>             "Owner": {
>                 "DisplayName": "sysops+stage", 
>                 "ID": 
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
>             }, 
>             "IsLatest": false, 
>             "VersionId": ".sBcE6cXeggekOnSgZ4n7pyCDHnsLERK", 
>             "Key": "tmp/vijayant/test/s3a/", 
>             "LastModified": "2020-03-27T07:39:10.000Z"
>         }, 
>         {
>             "Owner": {
>                 "DisplayName": "sysops+stage", 
>                 "ID": 
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
>             }, 
>             "IsLatest": false, 
>             "VersionId": "nzm39jiUPC4H0ZaS.5Shp0FYPnR8wNf9", 
>             "Key": "tmp/vijayant/test/s3a/", 
>             "LastModified": "2020-03-27T07:39:09.000Z"
>         }, 
>         {
>             "Owner": {
>                 "DisplayName": "sysops+stage", 
>                 "ID": 
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
>             }, 
>             "IsLatest": false, 
>             "VersionId": "BPM65R1HkZngPDYtDL3zPZYPw_G_m9Ic", 
>             "Key": "tmp/vijayant/test/s3a/", 
>             "LastModified": "2020-03-27T07:39:08.000Z"
>         }, 
>         {
>             "Owner": {
>                 "DisplayName": "sysops+stage", 
>                 "ID": 
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
>             }, 
>             "IsLatest": true, 
>             "VersionId": "LJt8_MVDOiD4UdgUqEMycxjvtinJlTNt", 
>             "Key": "tmp/vijayant/test/s3a/_temporary/", 
>             "LastModified": "2020-03-27T07:39:11.000Z"
>         }, 
>         {
>             "Owner": {
>                 "DisplayName": "sysops+stage", 
>                 "ID": 
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
>             }, 
>             "IsLatest": false, 
>             "VersionId": "RqunJTn8Od0PgFR4yu44PX4kL54k6EDv", 
>             "Key": "tmp/vijayant/test/s3a/_temporary/", 
>             "LastModified": "2020-03-27T07:39:09.000Z"
>         }, 
>         {
>             "Owner": {
>                 "DisplayName": "sysops+stage", 
>                 "ID": 
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
>             }, 
>             "IsLatest": false, 
>             "VersionId": "4vY8cnqUI5VJAk3VfEt_VD_KEczo3bmY", 
>             "Key": "tmp/vijayant/test/s3a/_temporary/", 
>             "LastModified": "2020-03-27T07:39:08.000Z"
>         }, 
>         {
>             "Owner": {
>                 "DisplayName": "sysops+stage", 
>                 "ID": 
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
>             }, 
>             "IsLatest": true, 
>             "VersionId": "ln47YYy.yiE.k70cvqvfgYCEQoYFnKQW", 
>             "Key": "tmp/vijayant/test/s3a/_temporary/0/", 
>             "LastModified": "2020-03-27T07:39:11.000Z"
>         }, 
>         {
>             "Owner": {
>                 "DisplayName": "sysops+stage", 
>                 "ID": 
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
>             }, 
>             "IsLatest": false, 
>             "VersionId": "5Bsrt7s1caM90mzGNgk0MsTU9q8UjTTA", 
>             "Key": "tmp/vijayant/test/s3a/_temporary/0/", 
>             "LastModified": "2020-03-27T07:39:09.000Z"
>         }, 
>         {
>             "Owner": {
>                 "DisplayName": "sysops+stage", 
>                 "ID": 
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
>             }, 
>             "IsLatest": true, 
>             "VersionId": "pN3HzDfnmqIqrMwAL2jqKEBkvoHZALor", 
>             "Key": "tmp/vijayant/test/s3a/_temporary/0/_temporary/", 
>             "LastModified": "2020-03-27T07:39:11.000Z"
>         }, 
>         {
>             "Owner": {
>                 "DisplayName": "sysops+stage", 
>                 "ID": 
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
>             }, 
>             "IsLatest": false, 
>             "VersionId": "wg91poO1KXReXxvsZHzZXrHR1IgIX8t2", 
>             "Key": "tmp/vijayant/test/s3a/_temporary/0/_temporary/", 
>             "LastModified": "2020-03-27T07:39:09.000Z"
>         }, 
>         {
>             "Owner": {
>                 "DisplayName": "sysops+stage", 
>                 "ID": 
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
>             }, 
>             "IsLatest": true, 
>             "VersionId": "cv5Noykq3sMilQqJXAH3E.N7qAWnIBx7", 
>             "Key": 
> "tmp/vijayant/test/s3a/_temporary/0/_temporary/attempt_20200327073907_0001_m_000000_1/",
>  
>             "LastModified": "2020-03-27T07:39:11.000Z"
>         }, 
>         {
>             "Owner": {
>                 "DisplayName": "sysops+stage", 
>                 "ID": 
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
>             }, 
>             "IsLatest": false, 
>             "VersionId": "6xzt9SxlCUJaOLD8krkE3yXfQU14rErX", 
>             "Key": 
> "tmp/vijayant/test/s3a/_temporary/0/_temporary/attempt_20200327073907_0001_m_000000_1/",
>  
>             "LastModified": "2020-03-27T07:39:09.000Z"
>         }, 
>         {
>             "Owner": {
>                 "DisplayName": "sysops+stage", 
>                 "ID": 
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
>             }, 
>             "IsLatest": true, 
>             "VersionId": "wGmJAo7x_gkLWAiHzxPGdPMVSus7Wcp1", 
>             "Key": 
> "tmp/vijayant/test/s3a/_temporary/0/_temporary/attempt_20200327073907_0001_m_000000_1/part-00000-3923e1b1-406c-4202-b9a8-3bd7cb2d97b2-c000.snappy.parquet",
>  
>             "LastModified": "2020-03-27T07:39:10.000Z"
>         }
>     ], 
>     "Versions": [
>         {
>             "LastModified": "2020-03-27T07:39:11.000Z", 
>             "VersionId": "2py_ZXKl7yh6fwhzksAx8Os1BriDJCBb", 
>             "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"", 
>             "StorageClass": "STANDARD", 
>             "Key": "tmp/vijayant/test/s3a/_SUCCESS", 
>             "Owner": {
>                 "DisplayName": "sysops+stage", 
>                 "ID": 
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
>             }, 
>             "IsLatest": true, 
>             "Size": 0
>         }, 
>         {
>             "LastModified": "2020-03-27T07:39:08.000Z", 
>             "VersionId": "lDqTnLCqDYtjrOiY.V7E6AKTRQLKrqUT", 
>             "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"", 
>             "StorageClass": "STANDARD", 
>             "Key": "tmp/vijayant/test/s3a/_temporary/0/", 
>             "Owner": {
>                 "DisplayName": "sysops+stage", 
>                 "ID": 
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
>             }, 
>             "IsLatest": false, 
>             "Size": 0
>         }, 
>         {
>             "LastModified": "2020-03-27T07:39:10.000Z", 
>             "VersionId": "g.rGoTDdmrGrNjrLchvwz3jMmGePkgiD", 
>             "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"", 
>             "StorageClass": "STANDARD", 
>             "Key": 
> "tmp/vijayant/test/s3a/_temporary/0/_temporary/attempt_20200327073907_0001_m_000000_1/",
>  
>             "Owner": {
>                 "DisplayName": "sysops+stage", 
>                 "ID": 
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
>             }, 
>             "IsLatest": false, 
>             "Size": 0
>         }, 
>         {
>             "LastModified": "2020-03-27T07:39:09.000Z", 
>             "VersionId": ".ZCpY2UW4hRlbLL87dFUJRuk021Hyq8p", 
>             "ETag": "\"3def7238a0858c17c62d7045290175cf\"", 
>             "StorageClass": "STANDARD", 
>             "Key": 
> "tmp/vijayant/test/s3a/_temporary/0/_temporary/attempt_20200327073907_0001_m_000000_1/part-00000-3923e1b1-406c-4202-b9a8-3bd7cb2d97b2-c000.snappy.parquet",
>  
>             "Owner": {
>                 "DisplayName": "sysops+stage", 
>                 "ID": 
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
>             }, 
>             "IsLatest": false, 
>             "Size": 384
>         }, 
>         {
>             "LastModified": "2020-03-27T07:39:10.000Z", 
>             "VersionId": "JSNjTDHSQqe9zSAV93bc6TXPuqA.vDJE", 
>             "ETag": "\"3def7238a0858c17c62d7045290175cf\"", 
>             "StorageClass": "STANDARD", 
>             "Key": 
> "tmp/vijayant/test/s3a/part-00000-3923e1b1-406c-4202-b9a8-3bd7cb2d97b2-c000.snappy.parquet",
>  
>             "Owner": {
>                 "DisplayName": "sysops+stage", 
>                 "ID": 
> "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
>             }, 
>             "IsLatest": true, 
>             "Size": 384
>         }
>     ]
> }
> {code}
> This in turn makes listing objects slow and we have even noticed timeouts due 
> to too many delete markers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to