[ 
https://issues.apache.org/jira/browse/HADOOP-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035262#comment-16035262
 ] 

Steve Loughran commented on HADOOP-14457:
-----------------------------------------

OK, I am effectively seeing this in my committer tests where the file 
{{s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc/_SUCCESS}}
 exists, but an attempt to list the parent dir fails as a delete marker is 
being found instead.

{code}
2017-06-02 19:59:19,791 [ScalaTest-main-running-S3ACommitDataframeSuite] DEBUG 
s3a.S3AFileSystem (S3AFileSystem.java:innerGetFileStatus(1899)) - Getting path 
status for 
s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc/_SUCCESS
  
(cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc/_SUCCESS)
2017-06-02 19:59:19,791 [ScalaTest-main-running-S3ACommitDataframeSuite] DEBUG 
s3guard.MetadataStore (LocalMetadataStore.java:get(151)) - 
get(s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc/_SUCCESS)
 -> file  
s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc/_SUCCESS
 3404    UNKNOWN  false 
S3AFileStatus{path=s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc/_SUCCESS;
 isDirectory=false; length=3404; replication=1; blocksize=1048576; 
modification_time=1496429958524; access_time=0; owner=stevel; group=stevel; 
permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false; 
isErasureCoded=false} isEmptyDirectory=FALSE
2017-06-02 19:59:19,792 [ScalaTest-main-running-S3ACommitDataframeSuite] DEBUG 
s3a.S3AFileSystem (S3AFileSystem.java:innerListStatus(1660)) - List status for 
path: 
s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc
2017-06-02 19:59:19,792 [ScalaTest-main-running-S3ACommitDataframeSuite] DEBUG 
s3a.S3AFileSystem (S3AFileSystem.java:innerGetFileStatus(1899)) - Getting path 
status for 
s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc
  
(cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc)
2017-06-02 19:59:19,792 [ScalaTest-main-running-S3ACommitDataframeSuite] DEBUG 
s3guard.MetadataStore (LocalMetadataStore.java:get(151)) - 
get(s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc)
 -> file  
s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc
 0       UNKNOWN  true  
FileStatus{path=s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc;
 isDirectory=false; length=0; replication=0; blocksize=0; 
modification_time=1496429951655; access_time=0; owner=; group=; 
permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false; 
isErasureCoded=false}
2017-06-02 19:59:19,801 [dispatcher-event-loop-6] INFO  
spark.MapOutputTrackerMasterEndpoint (Logging.scala:logInfo(54)) - 
MapOutputTrackerMasterEndpoint stopped!
2017-06-02 19:59:19,811 [dispatcher-event-loop-3] INFO  
scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint 
(Logging.scala:logInfo(54)) - OutputCommitCoordinator stopped!
2017-06-02 19:59:19,814 [ScalaTest-main-running-S3ACommitDataframeSuite] INFO  
spark.SparkContext (Logging.scala:logInfo(54)) - Successfully stopped 
SparkContext
- Dataframe+partitioned *** FAILED ***
  java.io.FileNotFoundException: Path 
s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc
 is recorded as deleted by S3Guard
  at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:1906)
  at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1881)
  at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerListStatus(S3AFileSystem.java:1664)
  at org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:1640)
  at 
com.hortonworks.spark.cloud.ObjectStoreOperations$class.validateRowCount(ObjectStoreOperations.scala:340)
  at 
com.hortonworks.spark.cloud.CloudSuite.validateRowCount(CloudSuite.scala:37)
  at 
com.hortonworks.spark.cloud.s3.commit.S3ACommitDataframeSuite.testOneFormat(S3ACommitDataframeSuite.scala:111)
  at 
com.hortonworks.spark.cloud.s3.commit.S3ACommitDataframeSuite$$anonfun$1$$anonfun$apply$2.apply$mcV$sp(S3ACommitDataframeSuite.scala:71)
  at 
com.hortonworks.spark.cloud.CloudSuiteTrait$$anonfun$ctest$1.apply$mcV$sp(CloudSuiteTrait.scala:66)
  at 
com.hortonworks.spark.cloud.CloudSuiteTrait$$anonfun$ctest$1.apply(CloudSuiteTrait.scala:64)
{code}

The _SUCCESS file is just created by a normal create/write/close sequence, so 
the patch here would fix it. But: other operations are simply completing a 
multpart PUT call, and not following the same path. 

What everything does do is call {{S3AFileSystem.finishedWrite()}}; which is 
used to set the final status and length. Maybe this is where the final 
operations should take place, specifically
# Normal FS: Delete any/all parent directories
# s3guard: add the parent entries.



> create() does not notify metadataStore of parent directories or ensure 
> they're not existing files
> -------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-14457
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14457
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Sean Mackrory
>         Attachments: HADOOP-14457-HADOOP-13345.001.patch, 
> HADOOP-14457-HADOOP-13345.002.patch
>
>
> Not a great test yet, but it at least reliably demonstrates the issue. 
> LocalMetadataStore will sometimes erroneously report that a directory is 
> empty with isAuthoritative = true when it *definitely* has children the 
> metadatastore should know about. It doesn't appear to happen if the children 
> are just directory. The fact that it's returning an empty listing is 
> concerning, but the fact that it says it's authoritative *might* be a second 
> bug.
> {code}
> diff --git 
> a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
>  
> b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
> index 78b3970..1821d19 100644
> --- 
> a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
> +++ 
> b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
> @@ -965,7 +965,7 @@ public boolean hasMetadataStore() {
>    }
>  
>    @VisibleForTesting
> -  MetadataStore getMetadataStore() {
> +  public MetadataStore getMetadataStore() {
>      return metadataStore;
>    }
>  
> diff --git 
> a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
>  
> b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
> index 4339649..881bdc9 100644
> --- 
> a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
> +++ 
> b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
> @@ -23,6 +23,11 @@
>  import org.apache.hadoop.fs.contract.AbstractFSContract;
>  import org.apache.hadoop.fs.FileSystem;
>  import org.apache.hadoop.fs.Path;
> +import org.apache.hadoop.fs.s3a.S3AFileSystem;
> +import org.apache.hadoop.fs.s3a.Tristate;
> +import org.apache.hadoop.fs.s3a.s3guard.DirListingMetadata;
> +import org.apache.hadoop.fs.s3a.s3guard.MetadataStore;
> +import org.junit.Test;
>  
>  import static org.apache.hadoop.fs.contract.ContractTestUtils.dataset;
>  import static org.apache.hadoop.fs.contract.ContractTestUtils.writeDataset;
> @@ -72,4 +77,24 @@ public void testRenameDirIntoExistingDir() throws 
> Throwable {
>      boolean rename = fs.rename(srcDir, destDir);
>      assertFalse("s3a doesn't support rename to non-empty directory", rename);
>    }
> +
> +  @Test
> +  public void testMkdirPopulatesFileAncestors() throws Exception {
> +    final FileSystem fs = getFileSystem();
> +    final MetadataStore ms = ((S3AFileSystem) fs).getMetadataStore();
> +    final Path parent = path("testMkdirPopulatesFileAncestors/source");
> +    try {
> +      fs.mkdirs(parent);
> +      final Path nestedFile = new Path(parent, "dir1/dir2/dir3/file4");
> +      byte[] srcDataset = dataset(256, 'a', 'z');
> +      writeDataset(fs, nestedFile, srcDataset, srcDataset.length,
> +          1024, false);
> +
> +      DirListingMetadata list = ms.listChildren(parent);
> +      assertTrue("MetadataStore falsely reports authoritative empty list",
> +          list.isEmpty() == Tristate.FALSE || !list.isAuthoritative());
> +    } finally {
> +      fs.delete(parent, true);
> +    }
> +  }
>  }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to