[ 
https://issues.apache.org/jira/browse/MESOS-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14351444#comment-14351444
 ] 

Michael Park commented on MESOS-2414:
-------------------------------------

I had a chance to look at this issue and Niklas's patch and I'd just like to 
document my thoughts here in case issues similar to this comes up again.

First off, static local variables are thread-safe as of C++11. They also have a 
"construct-on-first-use" semantics, but it's critical to note that this means 
the *initializer* of a static local variable gets called once.

Here's the original piece of code:

{code:original}
Result<jfieldID> getFieldID(/* ... */) {
  static jclass NO_SUCH_FIELD_ERROR = NULL;

  if (NO_SUCH_FIELD_ERROR == NULL) {
    NO_SUCH_FIELD_ERROR = env->FindClass("java/lang/NoSuchFieldError");
    if (env->ExceptionCheck() == JNI_TRUE) {
      return Error("Cannot find NoSuchFieldError class");
    }
  }

  /* ... */
}
{code}

In this case, the initializer of {{NO_SUCH_FIELD_ERROR}} is simply {{NULL}} 
which gets called once safely. We then proceed to run into races in the {{if 
(NO_SUCH_FIELD_ERROR == NULL)}} clause.

{{FindClass}} returns
> a class object from a fully-qualified name, or {{NULL}} if the class cannot 
> be found.

If {{FindClass}} *cannot* find the class, then we're okay. It returns {{NULL}}, 
{{NO_SUCH_FIELD_ERROR}} gets set to {{NULL}}, an exception is raised internally 
so {{if (env->ExceptionCheck() == JNI_TRUE)}} is true and we exit returning an 
{{Error}}. On subsequent invocations we'll simply keep trying this and we don't 
run into breaking issues.

If {{FindClass}} *can* find the class, we get into trouble. How? Here's a 
possible race:

1. Thread A comes along first and initializes {{NO_SUCH_FIELD_ERROR}} to 
{{NULL}}.
2. Thread A proceeds and executes {{NO_SUCH_FIELD_ERROR = 
env->FindClass("java/lang/NoSuchFieldError");}} which sets 
{{NO_SUCH_FIELD_ERROR}} to some valid pointer.
3. Thread B causes an exception to be thrown.
4. Thread A checks for (b) {{if (env->ExceptionCheck() == JNI_TRUE)}} and 
proceeds to return an {{Error}}. *Note that {{NO_SUCH_FIELD_ERROR}} is still 
set to a valid pointer!*
5. The instance that {{NO_SUCH_FIELD_ERROR}} is pointing at is 
garbage-collected (or something similar) which invalidates 
{{NO_SUCH_FIELD_ERROR}}.
6. On a subsequent call, {{NO_SUCH_FIELD_ERROR}} is set so we proceed to use it 
even though it's been invalidated. The first place we actually use 
{{{NO_SUCH_FIELD_ERROR}} is {{IsInstanceOf}} which is precisely where we were 
seg-faulting.

*Semi-related Q*: Is {{env->ExceptionCheck() == JNI_TRUE}} the correct 
predicate to check for the success of {{FindClass}}? Seems to me like we should 
go directly to checking that the result of {{FindClass}} is non-null.

I say the above is semi-related because I believe the crux of issue is that the 
initializing logic for {{NO_SUCH_FIELD_ERROR}} is not thread-safe. The 
initializing logic (i.e. the body of the {{if (NO_SUCH_FIELD_ERROR == NULL)}} 
clause) should just live in the initializer and the compiler will handle the 
thread-safety for us.

{code}
jclass findNoSuchFieldErrorClass() {
  jclass result = env->FindClass("java/lang/NoSuchFieldError");
  if (env->ExceptionCheck() == JNI_TRUE) {  // or whatever the correct 
condition should be
    return NULL;
  }
  return result;
}

Result<jfieldID> getFieldID(/* ... */) {
  static jclass NO_SUCH_FIELD_ERROR = findNoSuchFieldErrorClass();

  if (NO_SUCH_FIELD_ERROR == NULL) {
    return Error("Cannot find NoSuchFieldError class");
  }

  /* ... */
}
{code}

> Java bindings segfault during framework shutdown
> ------------------------------------------------
>
>                 Key: MESOS-2414
>                 URL: https://issues.apache.org/jira/browse/MESOS-2414
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Niklas Quarfot Nielsen
>            Assignee: Niklas Quarfot Nielsen
>
> {code}
> I0226 16:39:59.063369 626044928 sched.cpp:831] Stopping framework 
> '20150220-141149-16777343-5050-45194-0000'
> [2015-02-26 16:39:59,063] INFO Driver future completed. Executing optional 
> abdication command. (mesosphere.marathon.MarathonSchedulerService:191)
> [2015-02-26 16:39:59,065] INFO Setting framework ID to 
> 20150220-141149-16777343-5050-45194-0000 
> (mesosphere.marathon.MarathonSchedulerService:75)
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x0000000106a266d0, pid=99408, tid=44291
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0_25-b17) (build 
> 1.8.0_25-b17)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.25-b02 mixed mode bsd-amd64 
> compressed oops)
> # Problematic frame:
> # V  [libjvm.dylib+0x4266d0]  Klass::is_subtype_of(Klass*) const+0x4
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core 
> dumping, try "ulimit -c unlimited" before starting Java again
> #
> # An error report file with more information is saved as:
> # /Users/corpsi/projects/marathon/hs_err_pid99408.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.sun.com/bugreport/crash.jsp
> #
> Abort trap: 6
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to