Hello Phil,Thank you very much for the review. No, the original problem consists in the fact that Xlib function "XSetErrorHandler" is not thread-safe, so calling it from different threads by setting one error handler and restoring the previous error handler leads to such not easily reproducible bugs like this and the already fixed 6678385, for example when the second thread restores error handler set by the first thread after the first thread left the code block potentially generating XError and already unset its error handler. The only solution to this problem is introduction of one critical section for all pairs of calls to "XSetErrorHandler" function through WITH_XERROR_HANDLER, RESTORE_XERROR_HANDLER macros in the code of JDK. Even this solution is not complete, because JDK's code cannot influence invoked by it code from the third-party libraries, which also sets or potentially sets own error handlers. The purpose of the fix for "6678385" bug was to guarantee that AWT code sets its global error handler once and for the whole life time of Java application and allows Java code to set "synthetic" or not real error handlers without invocation of "XSetErrorHandler" function. While the idea of this fix is to guarantee that there is no place in JDK other than "src/solaris/native/sun/xawt/XlibWrapper.c", where "XSetErrorHandler" function is called. So this fix substitutes real calls to that native function for setting of "synthetic" error handlers through "sun.awt.X11.XErrorHandlerUtil" class.
Yes, this pattern follows a policy relying on the assumption that no other code has a "more important" need to have its X error handler installed, but with one correction which is "no other code in JDK". So this constraint is not applicable to a code of any third-party library, since it is impossible without collaboration between JDK and third parties on definition of common critical section. Unfortunately, I did not know about the opportunity of embedding Java application into another application.
I also do not know the point of saved_error_handler variable, it became unusable with the fix for the bug "6678385", but this seems to be a stable code which I just had to move from XToolkit class to XErrorHandlerUtil without any modification.
AWT_LOCK/AWT_UNLOCK code was added to guarantee that no other thread from JDK code both Java and native can set and unset synthetic error handlers simultaneously. This is the critical section, which I described in my first passage of this e-mail.
Thank you, Anton On 3/27/2013 12:12 AM, Phil Race wrote:
If I correctly understand the original problem it was that the restoration of an X Error Handler was expected to be to the original one installed by the XToolkit but there is no guarantee of that, so the essence of this fix is that we install our error handlers as we need them but then RESTORE_XERROR_HANDLER() is a bit of a misnomer since it really leaves the handler installed (as far as X11 is concerned) and in the Java code simply discardscurrent_error_handler. Then if an error occurs the Java code will fall through to SAVED_XERROR_HANDLER() which just ignores it. I suppose this policy relies on the assumption that no other code has a "more important" need to have its X error handler installed, so we have no obligation to restore it after we are done.I wonder if this is the right thing to do if we are embedded in another application.And I am not sure what the point is of saved_error_handler in XErrorHandlerUtil.java since you never use it. 561 JNIEnv* env = (JNIEnv*)JNU_GetEnv(jvm, JNI_VERSION_1_2); 562 AWT_LOCK(); 563 jboolean xShmAttachResult = TryXShmAttach(env, awt_display, shminfo); 564 AWT_UNLOCK(); embedding these declarations in the middle of the function may trigger a C compiler warning or error depending on the compiler. Best to move the declarations. Same in the other file.I'm curious, why did you add the AWT_LOCK/AWT_UNLOCK which was not there before? It may have been considered 'harmless' even if we already have the lock on this thread and its a Reentrant lock but it does increase the risk of deadlock, plus its got JNI up-call overhead ..but we seem to have a ton of that anyway. -phil. On 3/26/2013 5:40 AM, Anton Litvinov wrote:Hello,Please review the following fix for a bug. The fix passed 3 cycles of review by AWT development team. Artem Ananiev and Anthony Petrov approved it. But because the fix modifies also Java 2D Graphics code, review by 2D Graphics development team is necessary. New "webrev.04" was generated to resolve problem in not smooth patching of the latest version of the file "src/solaris/native/sun/java2d/opengl/GLXSurfaceData.c".Bug: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8005607 Webrev: http://cr.openjdk.java.net/~alitvinov/8005607/webrev.04 Thank you, Anton On 2/20/2013 5:43 PM, Artem Ananiev wrote:Looks fine. Thanks, Artem On 2/18/2013 8:08 PM, Anton Litvinov wrote:Hello Artem, Could you please review a new version of the fix. The method "XErrorHandlerUtil.getDisplay()" was removed and "XErrorHandlerUtil.XSync()" method refers to the field "display" directly now. Webrev: http://cr.openjdk.java.net/~alitvinov/8005607/webrev.03 Thank you, Anton On 2/18/2013 4:23 PM, Artem Ananiev wrote:On 2/18/2013 4:04 PM, Anton Litvinov wrote:Hello Artem, Thank you very much for the review of this fix. My responses to your questions are provided below in the same order, which you defined. 1. I think that "XErrorHandlerUtil.saved_error" field can surely be marked as private, but in this case the corresponding"XErrorHandlerUtil.getSavedError" method will be necessary, becausethis field is actively accessed from other classes which set a certain instance of XErrorHandler. For example"MotifDnDDropTargetProtocol.java", "XDragSourceProtocol.java" and afew other classes edited in this fix.OK, I missed that usages when looking at the webrev. Let it stay unchanged now.2. Yes, I completely agree that "XErrorHandlerUtil.getDisplay()" is reduntant. This method will be eliminated.Thanks, ArtemThank you, Anton On 2/18/2013 3:16 PM, Artem Ananiev wrote:Hi, Anton, a few minor comments: 1. XErrorHandlerUtil: can saved_error be private instead of package protected? 2. XErrorHandlerUtil.getDisplay() seems to be redundant. In general, the fix looks perfectly fine to me. Please, wait for comments from Java2D team, though. Thanks, Artem On 2/13/2013 8:57 PM, Anton Litvinov wrote:Hello Anthony, Could you please review the third version of the fix containing modifications discussed with you in the previous letter. Webrev: http://cr.openjdk.java.net/~alitvinov/8005607/webrev.02 This version of the fix differs from the previous in the following places: 1. A comment about the place of invocation of the method"XErrorHandlerUtil.init" was added to a documentation block of themethod. 2. A code related to XShmAttach function common to the files "src/solaris/native/sun/awt/awt_GraphicsEnv.c" and "src/solaris/native/sun/java2d/x11/X11SurfaceData.c" was extracted into a separate function "TryXShmAttach" declared in "src/solaris/native/sun/awt/awt_GraphicsEnv.h" file. 3. All JNI code related to X error handling was implemented as corresponding macros defined in "src/solaris/native/sun/awt/awt_util.h" file. Thank you, Anton On 1/31/2013 7:42 PM, Anton Litvinov wrote:Hello Anthony,Thank you for the review and these remarks. Surely, the comment willbe added. I think that all JNI code related to XShmAttach can bedefinitely transferred into a separate dedicated function, which will be declared in "src/solaris/native/sun/awt/awt_GraphicsEnv.h" file. I will try to wrap all JNU calls connected with XErrorHandler into the particular "WITH_XERROR_HANDLER", "RESTORE_XERROR_HANDLER" functionsor macros. Thank you, Anton On 1/31/2013 4:57 PM, Anthony Petrov wrote:Hi Anton, A couple comments: 1. src/solaris/classes/sun/awt/X11/XErrorHandlerUtil.java80 private static void init(long display) {This method is private and isn't called from anywhere in this class itself. This looks confusing. Please add a comment stating that thismethod is invoked from native code, and from where exactly.2. Interesting that we use this machinery to call the XShmAttach()from native code twice, and the code looks quite similar in eachcase. Would it be possible to extract the common code in a separate function (a-la BOOL TryXShmAttach(...)) to avoid code replication? There are other usages as well, so we could also introduce a macro(such as the old EXEC_WITH_XERROR_HANDLER but now with otherarguments) that would minimize all the JNU_ calls required to usethis machinery. Otherwise the fix looks great. -- best regards, Anthony On 1/30/2013 20:14, Anton Litvinov wrote:Hello Anthony, Could you, please, review a second version of the fix, which isbased on an idea of reusing the existing AWT native global errorhandler from Java 2D native code. Webrev: http://cr.openjdk.java.net/~alitvinov/8005607/webrev.01 The fix consists of the following parts:1. Migration of all X error handling code from XToolkit to a newXErrorHandlerUtil class for resolution of interdependency between a static initialization block of XToolkit and a block initializingjava.awt.GraphicsEnvironment singleton. Such dependency iscreated by new calls to XToolkit static methods from "src/solaris/native/sun/awt/awt_GraphicsEnv.c", "src/solaris/native/sun/java2d/x11/X11SurfaceData.c" files. 2. Substitution of XToolkit.WITH_XERROR_HANDLER, XToolkit.RESTORE_XERROR_HANDLER ... for corresponding methods,fields of XErrorHandlerUtil class in all places of JDK sourcecode, where they were used.3. Substitution of all found native X error handlers which areset in native code (awt_GraphicsEnv.c, X11SurfaceData.c, GLXSurfaceData.c) for new synthetic Java error handlers. 4. Removal of X error handling code used by the native error handlers from "solaris/native/sun/awt/awt_util.c" "solaris/native/sun/awt/awt_util.h" files. Thank you, Anton On 1/11/2013 3:45 PM, Anthony Petrov wrote:I'm not Jim, but as I indicated earlier my opinion is that the easiest way to fix this is to install the existing J2DXErrHandler() only once. That is, it is the second option listed by you. Ofcourse, the J2DXErrHandler needs to be updated as well to detect whether 2D code wants to use it at the moment or it must simply delegate to the previous handler (i.e. where the code currentlyinstalls/uninstalls the handler, it must instead set a global boolean flag or something.)While the first option (reusing the existing AWT machinery) is aninteresting idea in general, I think it is complex and wouldrequire too much additional testing and bring an unjustified riskto the solution for such a basic problem. -- best regards, Anthony On 1/11/2013 14:44, Anton Litvinov wrote:Hello Jim,Thank you very much for the review and provision of a new idea ofa solution. Elimination of the logic, which sets/unsets J2DXErrHandler() for each call "XShmAttach(awt_display, &shminfo))" should effectively resolve the issue, but only in caseif all other native error handlers, which were set by the system function "XSetErrorHandler()" in JDK or in any external library,observe the rule of relaying of all events, which are not relative to them, to the previously saved error handlers. Otherwise an error generated during "XShmAttach" function call will not be handled by J2DXErrHandler(). Could you answer the following question. By settingJ2DXErrHandler() only once and forever do you mean usage of AWT global event handler "static int ToolkitErrorHandler(Display *dpy, XErrorEvent * event)" from"src/solaris/native/sun/xawt/XlibWrapper.c" with Java synthetic handlers or creation of another global native error handler withJ2DXErrHandler as native synthetic handler? Thank you, Anton On 1/10/2013 5:44 AM, Jim Graham wrote:I think I'd rather see some way to prevent double-adding thehandler in the first place as well. Since it is only ever usedon errors I also think it is OK to set it once and leave it there forever... ...jim On 1/9/13 8:08 AM, Anthony Petrov wrote:Hi Anton et al.,If I read the description of the bug correctly, specificallythis part:The problem occurs, if another thread (for example, GTK thread) isdoing the same sort of thing concurrently. This can lead tothe following situation. JVM thread: Sets J2DXErrHandler(), saves ANY_PREVIOUS_HANDLER as previous GTK thread: Sets some GTK_HANDLER, saves J2DXErrHandler() as previous JVM thread: Restores ANY_PREVIOUS_HANDLER GTK thread: Restores J2DXErrHandler() JVM thread: Sets J2DXErrHandler(), saves J2DXErrHandler() as previousIt is obvious that at this final step 2D is in an inconsistentstate. Wedon't expect to replace our own error handler (and it shouldn'thave been there in the first place).I realize that the fix you propose works around this problem.But this doesn't look like an ideal solution to me.BTW, IIRC, in JDK7 (and 6?) we decided to set the actual X11error handler only once and never replace it. All the rest of thepush_handler/pop_handler logic is now located in Java code (see XToolkit.SAVED_ERROR_HANDLER() and the surrounding logic). Ibelievethat we should somehow share this machinery with the 2D code toavoid this sort of problems. Though I'm not sure if this will eliminate this exact issue. 2D/AWT folks: any other thoughts? -- best regards, Anthony On 12/29/2012 17:44, Anton Litvinov wrote:Hello, Please review the following fix for a bug. Bug: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8005607 https://jbs.oracle.com/bugs/browse/JDK-8005607 Webrev: http://cr.openjdk.java.net/~alitvinov/8005607/webrev.00 The bug consists in a crash which is caused by a stack overflow for the reason of an infinite recursion in AWT native functionJ2DXErrHandler() under certain circumstances on 32-bit LinuxOS. The fix is based on introduction of the logic, which detects indirect recursive calls to J2DXErrHandler() by means of a simple counter, to J2DXErrHandler() native function. Such a solution requires minimumcode changes, does not alter the handler's code significantlyand eliminates this bug. Adding 2d-dev@openjdk.java.net e-mail alias to the list of recipientsof this letter, because the edited function's name is relatedto Java2D area of JDK, despite of the fact that the edited file islocated in AWT directory. Thank you, Anton