[
https://issues.apache.org/jira/browse/HADOOP-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035217#comment-16035217
]
Steve Loughran commented on HADOOP-13998:
-----------------------------------------
regarding tests, I'm seeing something up with the combination of (s3guard and
the partition committer (and only it)): a newly created file is where it should
be, but the parent dir is still tagged as missing. I can GET the file, but if
I try to list the parent I get rejected:
{code}
2017-06-02 18:19:10,709 [ScalaTest-main-running-S3ACommitDataframeSuite] INFO
s3.S3AOperations (Logging.scala:logInfo(54)) -
s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc/part-00000-7573c876-38e5-4024-8a53-51fa1aa9c9c2-c000.snappy.orc
size=384
2017-06-02 18:19:10,709 [ScalaTest-main-running-S3ACommitDataframeSuite] DEBUG
s3a.S3AFileSystem (S3AFileSystem.java:innerGetFileStatus(1899)) - Getting path
status for
s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc/_SUCCESS
(cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc/_SUCCESS)
2017-06-02 18:19:10,710 [ScalaTest-main-running-S3ACommitDataframeSuite] DEBUG
s3guard.MetadataStore (LocalMetadataStore.java:get(151)) -
get(s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc/_SUCCESS)
-> file
s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc/_SUCCESS
3400 UNKNOWN false
S3AFileStatus{path=s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc/_SUCCESS;
isDirectory=false; length=3400; replication=1; blocksize=1048576;
modification_time=1496423948811; access_time=0; owner=stevel; group=stevel;
permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false;
isErasureCoded=false} isEmptyDirectory=FALSE
2017-06-02 18:19:10,710 [ScalaTest-main-running-S3ACommitDataframeSuite] DEBUG
s3a.S3AFileSystem (S3AFileSystem.java:innerListStatus(1660)) - List status for
path:
s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc
2017-06-02 18:19:10,710 [ScalaTest-main-running-S3ACommitDataframeSuite] DEBUG
s3a.S3AFileSystem (S3AFileSystem.java:innerGetFileStatus(1899)) - Getting path
status for
s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc
(cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc)
2017-06-02 18:19:10,711 [ScalaTest-main-running-S3ACommitDataframeSuite] DEBUG
s3guard.MetadataStore (LocalMetadataStore.java:get(151)) -
get(s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc)
-> file
s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc
0 UNKNOWN true
FileStatus{path=s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc;
isDirectory=false; length=0; replication=0; blocksize=0;
modification_time=1496423936532; access_time=0; owner=; group=;
permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false;
isErasureCoded=false}
2017-06-02 18:19:10,719 [dispatcher-event-loop-6] INFO
spark.MapOutputTrackerMasterEndpoint (Logging.scala:logInfo(54)) -
MapOutputTrackerMasterEndpoint stopped!
2017-06-02 18:19:10,727 [dispatcher-event-loop-3] INFO
scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint
(Logging.scala:logInfo(54)) - OutputCommitCoordinator stopped!
2017-06-02 18:19:10,729 [ScalaTest-main-running-S3ACommitDataframeSuite] INFO
spark.SparkContext (Logging.scala:logInfo(54)) - Successfully stopped
SparkContext
- Dataframe+partitioned *** FAILED ***
java.io.FileNotFoundException: Path
s3a://hwdev-steve-new/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/partitioned/orc
is recorded as deleted by S3Guard
at
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:1906)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1881)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.innerListStatus(S3AFileSystem.java:1664)
at org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:1640)
at
com.hortonworks.spark.cloud.ObjectStoreOperations$class.validateRowCount(ObjectStoreOperations.scala:340)
at
com.hortonworks.spark.cloud.CloudSuite.validateRowCount(CloudSuite.scala:37)
at
com.hortonworks.spark.cloud.s3.commit.S3ACommitDataframeSuite.testOneFormat(S3ACommitDataframeSuite.scala:107)
at
com.hortonworks.spark.cloud.s3.commit.S3ACommitDataframeSuite$$anonfun$1$$anonfun$apply$2.apply$mcV$sp(S3ACommitDataframeSuite.scala:71)
at
com.hortonworks.spark.cloud.CloudSuiteTrait$$anonfun$ctest$1.apply$mcV$sp(CloudSuiteTrait.scala:66)
at
com.hortonworks.spark.cloud.CloudSuiteTrait$$anonfun$ctest$1.apply(CloudSuiteTrait.scala:64)
{code}
I don't know where the blame lies here, but its something I'd like to
understand first. IT does not happen when s3guard is off; there the new
committer works
> initial s3guard preview
> -----------------------
>
> Key: HADOOP-13998
> URL: https://issues.apache.org/jira/browse/HADOOP-13998
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Reporter: Steve Loughran
>
> JIRA to link in all the things we think are needed for a preview/merge into
> trunk
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]