Re: RFR: 8175010: ImageReader is not thread-safe

Peter Levart Thu, 16 Feb 2017 00:18:36 -0800

Hi Claes,

On 02/15/2017 10:41 PM, Claes Redestad wrote:

Hi Peter,


are you suggesting that if I have class Foo { Bar b; }, then creating
and putting Foo b in a CHM before returning it to a consumer which is
then read from another thread is enough to force b to be safely
published even when the other thread does *not* get the object via a
call to the same CHM (which would go via the same volatile read and add
the necessary happens before relationship)?  I recalled having seen
examples to the effect that a non-volatile b isn't safely published in
this case.

No, I'm not suggesting that. I'm merely observing (and hoping that myIDE didn't miss places in code where this is not the case) that there isno unsafe publication going on here. Obtaining ImageReader fromImageReaderFactory.get() is safe as it either returns an instanceconstructed in the same thread or it returns an instance constructed ina different thread, but in that case, the instance is handed throughCHM.set/get which enforces ordering. The only other place where I foundImageReader.open() call is in constructing the SystemImageimplementation as an anonymous inner instance which holds an ImageReaderin its synthetic final field which ensures safe publication ofImageReader even if SystemImage instance is published unsafely.


The (very shaky) hypothesis is thus that this could be what's happening
in any of the places which load and locally cache the system ImageReader

for anyone to use, e.g., SystemModuleFinder.SystemImage

...here the ImageReader instance is obtained viaImageReaderFactory.getImageReader():


    /**
     * Holder class for the ImageReader
     */
    private static class SystemImage {
        static final ImageReader READER;
        static {
            long t0 = System.nanoTime();
            READER = ImageReaderFactory.getImageReader();
            initTime.addElapsedTimeFrom(t0);
        }

        static ImageReader reader() {
            return READER;
        }
    }

...which is just a shortcut forImageReaderFactory.get(BOOT_MODULES_JIMAGE) and I believeImageReaderFactory.get() publishes (and did publish even before thepatch) ImageReader instances safely (old code):

52 private static final Map<Path, ImageReader> readers = newConcurrentHashMap<>();

  53
  54     /**

55 * Returns an {@code ImageReader} to read from the given imagefile

  56      */
  57     public static ImageReader get(Path jimage) throws IOException {
  58         Objects.requireNonNull(jimage);
  59         ImageReader reader = readers.get(jimage);
  60         if (reader != null) {

61 return reader; // <<-- HERE the instance was handedthrough CHM

  62         }
  63         reader = ImageReader.open(jimage);
  64         // potential race with other threads opening the same URL
  65         ImageReader r = readers.putIfAbsent(jimage, reader);
  66         if (r == null) {

67 return reader; // <<-- HERE the instance wasconstructed in same thread

  68         } else {
  69             reader.close();
  70             return r; // <<-- HERE the instance was handed through CHM
  71         }
  72     }

or
JavaRuntimeURLConnection (which is implicitly called when there's a
security manager installed).

...here the instance is also obtained viaImageReaderFactory.getImageReader():


    // ImageReader to access resources in jimage
    private static final ImageReader reader;
    static {

PrivilegedAction<ImageReader> pa =ImageReaderFactory::getImageReader;

        reader = AccessController.doPrivileged(pa);
    }

In both above places the ImageReader instance is obtained in <clinit>and this is another "layer" of synchronization between consumer andproducer threads.

I might be (in fact likely am) wrong, but
we discussed this offline and came to the conclusion that there was no
harm in implementing these improvements regardless of whether they
actually resolve 8174817 or not.

No, there's no harm and I support the changes. It's just that they mightnot help in resolving the problem.


I think prior to this patch a concurrent ImageReader.close() could
happen if there was a race between 3 or more threads to resolve the
same Path from ImageReaderFactory.get (especially since there might be
a longish time window there since we might block to load a library
etc), so I don't think we lose anything from plugging that hole by
using computeIfAbsent here.

I don't believe this was the the source of concurrent close(). If youlook at the code (above), you can see that reader.close() (line 69) iscalled only if the reader was not successfully published and it iscalled by the thread that constructed it. But computeIfAbsent is a goodchoice here to avoid optimistic constructions followed by close(s) whenthere is concurrency.


I think your observations about potential issues in JRTFS is correct,
but there was nothing to suggest JRTFS code was involved in JDK-8174817
(as it's not code that's used by the BuiltinClassLoader).

The only remaining ImageReader.close() invocation is now injdk.internal.jrtfs.SystemImage anonymous inner class delegated fromSystemImage.close(), which is only called from JrtFileSystem.close() or.finalize(). So you think JRTFS is not the source of the problem? Whatif JRTFS is accessing the same image as JavaRuntimeURLConnection in thesame VM. Different instances of ImageReader would be used for them, butthey would share the same underlying SharedImageReader. Currently Idon't see a problem in SharedImageReader code, but I haven't studied itcarefully yet. Maybe there is something to be found there...


Regards, Peter


Thanks!

/Claes

On 2017-02-15 21:52, Peter Levart wrote:

Hi Claes,

Reading the https://bugs.openjdk.java.net/browse/JDK-8174817 and then
the change that was just pushed, I can't seem to figure out what was the
problem with original code. I can't find evidence for claims in
https://bugs.openjdk.java.net/browse/JDK-8175010 . Is the problem
publication of ImageReader via ImageReaderFactory? That can't be it,
since ImageReaderFactory is using ConcurrentHashMap which ensures
happens before relationships.

Is there any place else where ImageReader.open(Path) is called and then
the instance unsafely published to other threads? The only place I could
find is in jdk.internal.jrtfs.SystemImage.open():

    static SystemImage open() throws IOException {
        if (modulesImageExists) {
            // open a .jimage and build directory structure
            final ImageReader image = ImageReader.open(moduleImageFile);
            image.getRootDirectory();
            return new SystemImage() {
                @Override
                Node findNode(String path) throws IOException {
                    return image.findNode(path);
                }
                @Override
                byte[] getResource(Node node) throws IOException {
                    return image.getResource(node);
                }
                @Override
                void close() throws IOException {
                    image.close();
                }
            };
        }

...here the final 'image' local variable is captured by anonymous inner
SystemImage subclass into a synthetic final field, so this final field

ensures that ImageReader is published safely as a delegate ofSystemImage.


ImageReader.open(Path) factory method delegates to
ImageReader.SharedImageReader.open(Path, ByteOrder) which creates a new
instance of ImageReader encapsulating a SharedImageReader object:

          public static ImageReader open(Path imagePath, ByteOrder
byteOrder) throws IOException {
            Objects.requireNonNull(imagePath);
            Objects.requireNonNull(byteOrder);

            synchronized (OPEN_FILES) {
                SharedImageReader reader = OPEN_FILES.get(imagePath);

                if (reader == null) {
                    // Will fail with an IOException if wrong byteOrder.

reader = new SharedImageReader(imagePath,byteOrder);

                    OPEN_FILES.put(imagePath, reader);
                } else if (reader.getByteOrder() != byteOrder) {
                    throw new IOException("\"" + reader.getName() + "\"
is not an image file");
                }

ImageReader image = new ImageReader(reader); // <<--HERE

                reader.openers.add(image);

                return image;
            }
        }

...the ImageReader returned from this method is not safe to publish via
data race, but as far as I can see, there is no such publishing going
on. So am I right in that all this patch solves is it eliminates a
possibility of NPE when ImageReader is close()-d concurrently with being
used from some other thread. If such NPE was observed, it means that
close() is being called concurrently with access and there are still
races possible which can cause undesired effects. For example: calling
ImageReader.close() while using it concurrently may close underlying
SharedImageReader and then after closing, access it.

So is there a concurrent ImageReader.close() possible? The only place I
see ImageReader.close() being invoked is from SystemImage.close() in the
anonymous inner class implementation which wraps ImageReader.
SystemImage.close() is only being invoked from JrtFileSystem.cleanup(),
which is called from JrtFileSystem.close() and JrtFileSystem.finalize().

The following is theoretically possible:

FileSystem fs = FileSystems.newFileSystem(URI.create("jrt:/"), ...);

Path p = fs.getPath(...);
FileSystemProvider provider = fs.provider();
InputStream is = provider.newInputStream(p, ...);
// 'fs' and 'p' (which has a reference to 'fs') may be found finalizable
and be finalized while the call to obtain content of input stream is in
progress

For this to be prevented, the following method in JrtFileSystem:

    // returns the content of the file resource specified by the path
    byte[] getFileContent(JrtPath path) throws IOException {
        Node node = checkNode(path);
        if (node.isDirectory()) {
            throw new FileSystemException(path + " is a directory");
        }
        //assert node.isResource() : "resource node expected here";
        return image.getResource(node);
    }

...would have to be changed to:

    byte[] getFileContent(JrtPath path) throws IOException {
        Node node = checkNode(path);
        if (node.isDirectory()) {
            throw new FileSystemException(path + " is a directory");
        }
        //assert node.isResource() : "resource node expected here";
        byte[] content = image.getResource(node);
        Reference.reachabilityFence(this);
        return content;
    }


I don't claim that this is what causes the problems and I haven't found
any such usages, but this is theoretically possible.

Regards, Peter

On 02/15/2017 02:22 PM, Claes Redestad wrote:

Hi,

a few intermittent but rare test failures[1] that has appeared
since the latest jake integration, and since one of the changes
in there was to make initialization of the system ImageReader
lazy there appears to be cases where ImageReaders are not
safely published:

- Ensure ImageReader::open is called only once per Path in
ImageReaderFactory by using CHM.computeIfAbsent
- Ensure ImageReader.reader is safely published to a
final field and signal close using a volatile boolean instead

webrev: http://cr.openjdk.java.net/~redestad/8175010/webrev.02/
bug: https://bugs.openjdk.java.net/browse/JDK-8175010

Testing shows no issues (which admittedly doesn't mean we're
actually solving the root cause for JDK-8174817), and performance
numbers from adding a volatile read indicate that any overhead
is lost in the noise on ImageReader-heavy workloads.

Thanks!

/Claes

[1] https://bugs.openjdk.java.net/browse/JDK-8174817

Re: RFR: 8175010: ImageReader is not thread-safe

Reply via email to