[
https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16125872#comment-16125872
]
Ewan Higgs commented on HADOOP-13786:
-------------------------------------
Hi, I have been testing this locally with the [Hortonworks cloud-integration
project|https://github.com/hortonworks-spark/cloud-integration] and an S3
compatible backend that is has strong consistency. Because it has strong
consistency one would expect the {{NullMetadataStore}} to work. However, I'm
getting some errors.
To reproduce, I build Hadoop as follows:
{code}
mvn install -DskipShade -Dmaven.javadoc.skip=true -Pdist,parallel-tests
-DtestsThreadCount=8 -Djava.awt.headless=true -Ddeclared.hadoop.version=2.11
-DskipTests
{code}
I ran into some NPEs:
{code}
S3AFileStatus{path=s3a://s3guard-test/cloud-integration/DELAY_LISTING_ME/S3ACommitDataframeSuite/dataframe-committer/committer-default-orc/orc/part-00000-8b8b323b-c747-4d72-b331-b6de1c1f8387-c000.snappy.orc;
isDirectory=false; length=2995; replication=1; blocksize=1048576;
modification_time=1502715661000; access_time=0; owner=ehiggs; group=ehiggs;
permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false;
isErasureCoded=false} isEmptyDirectory=FALSE
2017-08-14 15:01:02,297 [ScalaTest-main-running-S3ACommitDataframeSuite] DEBUG
s3a.S3AFileSystem (Listing.java:sourceHasNext(374)) - Start iterating the
provided status.
2017-08-14 15:01:02,304 [ScalaTest-main-running-S3ACommitDataframeSuite] ERROR
commit.S3ACommitDataframeSuite (Logging.scala:logError(91)) - After 237,747,296
nS: java.lang.NullPointerException
java.lang.NullPointerException
at
org.apache.hadoop.fs.LocatedFileStatus.<init>(LocatedFileStatus.java:87)
at
org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$org$apache$spark$sql$execution$datasources$InMemoryFileIndex$$listLeafFiles$3.apply(InMemoryFileIndex.scala:299)
at
org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$org$apache$spark$sql$execution$datasources$InMemoryFileIndex$$listLeafFiles$3.apply(InMemoryFileIndex.scala:281)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at
org.apache.spark.sql.execution.datasources.InMemoryFileIndex$.org$apache$spark$sql$execution$datasources$InMemoryFileIndex$$listLeafFiles(InMemoryFileIndex.scala:281)
at
org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$org$apache$spark$sql$execution$datasources$InMemoryFileIndex$$bulkListLeafFiles$1.apply(InMemoryFileIndex.scala:172)
at
org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$org$apache$spark$sql$execution$datasources$InMemoryFileIndex$$bulkListLeafFiles$1.apply(InMemoryFileIndex.scala:171)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
{code}
I'll attach the trace.
> Add S3Guard committer for zero-rename commits to S3 endpoints
> -------------------------------------------------------------
>
> Key: HADOOP-13786
> URL: https://issues.apache.org/jira/browse/HADOOP-13786
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs/s3
> Affects Versions: HADOOP-13345
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Attachments: HADOOP-13786-HADOOP-13345-001.patch,
> HADOOP-13786-HADOOP-13345-002.patch, HADOOP-13786-HADOOP-13345-003.patch,
> HADOOP-13786-HADOOP-13345-004.patch, HADOOP-13786-HADOOP-13345-005.patch,
> HADOOP-13786-HADOOP-13345-006.patch, HADOOP-13786-HADOOP-13345-006.patch,
> HADOOP-13786-HADOOP-13345-007.patch, HADOOP-13786-HADOOP-13345-009.patch,
> HADOOP-13786-HADOOP-13345-010.patch, HADOOP-13786-HADOOP-13345-011.patch,
> HADOOP-13786-HADOOP-13345-012.patch, HADOOP-13786-HADOOP-13345-013.patch,
> HADOOP-13786-HADOOP-13345-015.patch, HADOOP-13786-HADOOP-13345-016.patch,
> HADOOP-13786-HADOOP-13345-017.patch, HADOOP-13786-HADOOP-13345-018.patch,
> HADOOP-13786-HADOOP-13345-019.patch, HADOOP-13786-HADOOP-13345-020.patch,
> HADOOP-13786-HADOOP-13345-021.patch, HADOOP-13786-HADOOP-13345-022.patch,
> HADOOP-13786-HADOOP-13345-023.patch, HADOOP-13786-HADOOP-13345-024.patch,
> HADOOP-13786-HADOOP-13345-025.patch, HADOOP-13786-HADOOP-13345-026.patch,
> HADOOP-13786-HADOOP-13345-027.patch, HADOOP-13786-HADOOP-13345-028.patch,
> HADOOP-13786-HADOOP-13345-028.patch, HADOOP-13786-HADOOP-13345-029.patch,
> HADOOP-13786-HADOOP-13345-030.patch, HADOOP-13786-HADOOP-13345-031.patch,
> HADOOP-13786-HADOOP-13345-032.patch, HADOOP-13786-HADOOP-13345-033.patch,
> HADOOP-13786-HADOOP-13345-035.patch, objectstore.pdf, s3committer-master.zip
>
>
> A goal of this code is "support O(1) commits to S3 repositories in the
> presence of failures". Implement it, including whatever is needed to
> demonstrate the correctness of the algorithm. (that is, assuming that s3guard
> provides a consistent view of the presence/absence of blobs, show that we can
> commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output
> streams (ie. not visible until the close()), if we need to use that to allow
> us to abort commit operations.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]