[
https://issues.apache.org/jira/browse/HADOOP-18456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17608688#comment-17608688
]
ASF GitHub Bot commented on HADOOP-18456:
-----------------------------------------
dannycjones commented on code in PR #4909:
URL: https://github.com/apache/hadoop/pull/4909#discussion_r978470575
##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/WeakReferenceMap.java:
##########
@@ -132,35 +145,93 @@ public WeakReference<V> lookup(K key) {
* @return an instance.
*/
public V get(K key) {
- final WeakReference<V> current = lookup(key);
- V val = resolve(current);
- if (val != null) {
+ final WeakReference<V> currentWeakRef = lookup(key);
+ // resolve it, after which if not null, we have a strong reference
+ V strongVal = resolve(currentWeakRef);
+ if (strongVal != null) {
// all good.
- return val;
+ return strongVal;
}
- // here, either no ref, or the value is null
- if (current != null) {
+ // here, either currentWeakRef was null, or its reference was GC'd.
+ if (currentWeakRef != null) {
+ // garbage collection removed the reference.
+
+ // explicitly remove the weak ref from the map if it has not
+ // been updated by this point
+ // this is here just for completeness.
+ map.remove(key, currentWeakRef);
+
+ // log/report the loss.
noteLost(key);
}
+
+ // create a new value and add it to the map
return create(key);
}
/**
* Create a new instance under a key.
+ * <p>
* The instance is created, added to the map and then the
* map value retrieved.
* This ensures that the reference returned is that in the map,
* even if there is more than one entry being created at the same time.
+ * If that race does occur, it will be logged the first time it happens
+ * for this specific map instance.
+ * <p>
+ * HADOOP-18456 highlighted the risk of a concurrent GC resulting a null
+ * value being retrieved and so returned.
+ * To prevent this:
+ * <ol>
+ * <li>A strong reference is retained to the newly created instance
+ * in a local variable.</li>
+ * <li>That variable is used after the resolution process, to ensure
+ * the JVM doesn't consider it "unreachable" and so eligible for
GC.</li>
+ * <li>A check is made for the resolved reference being null, and if so,
+ * the put() is repeated</li>
+ * </ol>
* @param key key
- * @return the value
+ * @return the created value
*/
public V create(K key) {
entriesCreatedCount.incrementAndGet();
- WeakReference<V> newRef = new WeakReference<>(
- requireNonNull(factory.apply(key)));
- map.put(key, newRef);
- return map.get(key).get();
+ /*
+ Get a strong ref so even if a GC happens in this method the reference is
not lost.
+ It is NOT enough to have a reference in a field, it MUST be used
+ so as to ensure the reference isn't optimized away prematurely.
+ "A reachable object is any object that can be accessed in any potential
continuing
+ computation from any live thread."
+ */
+
+ final V strongRef = requireNonNull(factory.apply(key),
+ "factory returned a null instance");
+ V resolvedStrongRef;
+ do {
+ WeakReference<V> newWeakRef = new WeakReference<>(strongRef);
+
+ // put it in the map
+ map.put(key, newWeakRef);
+
+ // get it back from the map
+ WeakReference<V> retrievedWeakRef = map.get(key);
+ // resolve that reference, handling the situation where somehow it was
removed from the map
+ // between the put() and the get()
+ resolvedStrongRef = resolve(retrievedWeakRef);
Review Comment:
If we have a strong reference on L207, why do we expect to lose it?
I could see it happening in the old `create(K key)` implementation, but less
so here.
> NullPointerException in ObjectListingIterator's constructor
> -----------------------------------------------------------
>
> Key: HADOOP-18456
> URL: https://issues.apache.org/jira/browse/HADOOP-18456
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 3.3.9
> Reporter: Quanlong Huang
> Assignee: Steve Loughran
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> We saw NullPointerExceptions in Impala's S3 tests: IMPALA-11592. It's thrown
> from the hadoop jar:
> {noformat}
> Caused by: java.lang.NullPointerException
> at
> org.apache.hadoop.fs.s3a.Listing$ObjectListingIterator.<init>(Listing.java:621)
> at
> org.apache.hadoop.fs.s3a.Listing.createObjectListingIterator(Listing.java:163)
> at
> org.apache.hadoop.fs.s3a.Listing.createFileStatusListingIterator(Listing.java:144)
> at
> org.apache.hadoop.fs.s3a.Listing.getListFilesAssumingDir(Listing.java:212)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerListFiles(S3AFileSystem.java:4790)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listFiles$37(S3AFileSystem.java:4732)
> at
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:543)
> at
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:524)
> at
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:445)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2363)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2382)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.listFiles(S3AFileSystem.java:4731)
> at
> org.apache.impala.common.FileSystemUtil.listFiles(FileSystemUtil.java:754)
> ... {noformat}
> We are using a private build of the hadoop jar. Version: CDP
> 3.1.1.7.2.16.0-164
> Code snipper of where the NPE throws:
> {code:java}
> 604 @Retries.RetryRaw
> 605 ObjectListingIterator(
> 606 Path listPath,
> 607 S3ListRequest request,
> 608 AuditSpan span) throws IOException {
> 609 this.listPath = listPath;
> 610 this.maxKeys = listingOperationCallbacks.getMaxKeys();
> 611 this.request = request;
> 612 this.objectsPrev = null;
> 613 this.iostats = iostatisticsStore()
> 614 .withDurationTracking(OBJECT_LIST_REQUEST)
> 615 .withDurationTracking(OBJECT_CONTINUE_LIST_REQUEST)
> 616 .build();
> 617 this.span = span;
> 618 this.s3ListResultFuture = listingOperationCallbacks
> 619 .listObjectsAsync(request, iostats, span);
> 620 this.aggregator =
> IOStatisticsContext.getCurrentIOStatisticsContext()
> 621 .getAggregator(); // <---- thrown here
> 622 }
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]