On 20/05/2024 10:12 pm, Aman Sharma wrote:
Hi David,


 > How did you try to intercept them? Hidden classes are not "loaded" in
the normal sense so won't trigger class load events.


I could not intercept them. I only see them when I pass `-verbose:class` in the Java CLI.

Yes that is why I asked how you tried to intercept them.


I also couldn't intercept them using JVMTI Class File Load Hook <https://docs.oracle.com/en/java/javase/21/docs/specs/jvmti.html#ClassFileLoadHook> event. However JEP 371 suggests that it should be possible to intercept them using JVMTI Class Load <https://docs.oracle.com/en/java/javase/21/docs/specs/jvmti.html#ClassLoad> event, but I won't have the bytecode at this stage. So is there no way to get its bytecode before it is linked and initialized in the JVM?

Hidden classes are not loaded so I would not expect any class load events. However the exact nature of the JVMTI class load event is unclear as it talks about "class or interface creation" which is neither loading or defining per se. But a class prepare event sounds like it should be issued. However neither give you access to the bytecode of the class AFAICS.

David
-----



Regards,
Aman Sharma

PhD Student
KTH Royal Institute of Technology
School of Electrical Engineering and Computer Science (EECS)
Department of Theoretical Computer Science (TCS)
<http://www.kth.se><https://www.kth.se/profile/amansha><https://www.kth.se/profile/amansha>
<https://www.kth.se/profile/amansha>https://algomaster99.github.io/ <https://algomaster99.github.io/>
------------------------------------------------------------------------
*From:* David Holmes <david.hol...@oracle.com>
*Sent:* Monday, May 20, 2024 2:59:17 AM
*To:* Aman Sharma; liangchenb...@gmail.com
*Cc:* core-libs-dev@openjdk.org; leyden-...@openjdk.org
*Subject:* Re: Deterministic naming of subclasses of `java/lang/reflect/Proxy`
On 17/05/2024 9:43 pm, Aman Sharma wrote:
Hi Chen,

  > java.lang.invoke.LambdaForm$MH/0x00000200cc000400

I do see this as output when I pass -verbose:class. However, based on my experiments, I have seen that neither an agent passed via 'javaagent' nor an agent passed via 'agentpath' is able to intercept this hidden class.

How did you try to intercept them? Hidden classes are not "loaded" in
the normal sense so won't trigger class load events.

Also, I was a bit confused since I saw somewhere that the names of hidden classes are null. But thanks for clarifying here.

The JEP clearly defines the name format for hidden classes - though the
final component is VM specific (and typically a hashcode).

https://openjdk.org/jeps/371 <https://openjdk.org/jeps/371>

Cheers,
David
-----

  > avoid dynamic class loading

I don't see dynamic class loading as a problem. I only mind some unstable generation aspects of them which make it hard to verify them based on an allowlist.

For example, if this hidden class is generated with the exact same name and the exact same bytecode during runtime as well, it would be easy to verify it. However, I do see the names are based on some sort of memory address so and I don't know what bytecode it has so I don't have suggestions to make them stable as of now. For Proxy classes, I feel it can be addressed unless you disagree or some involved in Project Leyden does. :) Thank you for forwarding my mail there.

Regards,
Aman Sharma

PhD Student
KTH Royal Institute of Technology
https://algomaster99.github.io/ <https://algomaster99.github.io/>
<https://algomaster99.github.io/ <https://algomaster99.github.io/>>

------------------------------------------------------------------------
*From:* liangchenb...@gmail.com <liangchenb...@gmail.com>
*Sent:* Friday, May 17, 2024 1:23:58 pm
*To:* Aman Sharma <aman...@kth.se>
*Cc:* core-libs-dev@openjdk.org <core-libs-dev@openjdk.org>; leyden-...@openjdk.org <leyden-...@openjdk.org> *Subject:* Re: Deterministic naming of subclasses of `java/lang/reflect/Proxy`

Hi Aman,
For `-verbose:class`, it's a JVM argument instead of a program argument; so when you run a java program like `java Main`, you should call it as `java -verbose:class Main`.
When done correctly, you should see hidden class outputs like:
[0.032s][info][class,load] java.lang.invoke.LambdaForm$MH/0x00000200cc000400 source: __JVM_LookupDefineClass__ The loading of java.lang.invoke hidden classes requires your program to use MethodHandle features, like a lambda.

I think the problem you are exploring, that to avoid dynamic class loading and effectively turn Java Platform closed for security, is also being accomplished by project Leyden (as I've shared initially); Thus, I am forwarding this to leyden-dev instead, so you can see what approach Leyden uses to accomplish the same goal as yours.

Regards, Chen Liang

On Fri, May 17, 2024 at 4:40 AM Aman Sharma <aman...@kth.se <mailto:aman...@kth.se <mailto:aman...@kth.se>>> wrote:

     __

     Hi Roger,


     Do you have ideas on how to intercept them? My javaagent is not able
     to nor a JVMTI agent passed using `agentpath` option. It also does
     not seem to show up in logs when I pass `-verbose:class`.


     Also, what do you think of renaming the proxy classes as suggested
     below?


     Regards,
     Aman Sharma

     PhD Student
     KTH Royal Institute of Technology
     School of Electrical Engineering and Computer Science (EECS)
     Department of Theoretical Computer Science (TCS)
     <http://www.kth.se><https://www.kth.se/profile/amansha><https://www.kth.se/profile/amansha 
<http://www.kth.se><https://www.kth.se/profile/amansha><https://www.kth.se/profile/amansha>>
     <https://www.kth.se/profile/amansha
<https://www.kth.se/profile/amansha>>https://algomaster99.github.io/
     <https://algomaster99.github.io/ <https://algomaster99.github.io/>>
     ------------------------------------------------------------------------
     *From:* core-libs-dev <core-libs-dev-r...@openjdk.org
     <mailto:core-libs-dev-r...@openjdk.org
<mailto:core-libs-dev-r...@openjdk.org>>> on behalf of Roger Riggs
     <roger.ri...@oracle.com <mailto:roger.ri...@oracle.com 
<mailto:roger.ri...@oracle.com>>>
     *Sent:* Friday, May 17, 2024 4:57:46 AM
     *To:* core-libs-dev@openjdk.org <mailto:core-libs-dev@openjdk.org 
<mailto:core-libs-dev@openjdk.org>>
     *Subject:* Re: Deterministic naming of subclasses of
     `java/lang/reflect/Proxy`
     Hi Aman,

     You may also run into hidden classes (JEP 371: Hidden Classes) that
     allow classes to be defined, at runtime, without names.
     It has been proposed to use them for generated proxies but that
     hasn't been implemented yet.
     There are benefits to having nameless classes, because they can't be
     referenced by name, only as a capability, they can be better
     encapsulated.

     fyi, Roger Riggs


     On 5/16/24 8:11 AM, Aman Sharma wrote:

     Hi,


     Thanks for your response, Liang!


     > I think you meant CVE-2021-42392 instead of 2022.


     Sorry of the error. I indeed meant CVE-2021-42392
     <https://nvd.nist.gov/vuln/detail/cve-2021-42392
<https://nvd.nist.gov/vuln/detail/cve-2021-42392>>.


     > Leyden mainly avoids this unstable generation by performing a
     training run to collect classes loaded


     Would love to know the details of Project Leyden and how they
     worked so far to focus on this goal. In our case, the training run
     is the test suite.


     > GeneratedConstructorAccessor is already retired by JEP 416 [2]
     in Java 18


     I did see them not appearing in my allowlist when I ran my study
     subject (Apache PDFBox) with Java 21. Thanks for letting me know
     about this JEP. I see they are re-implemented with method handles.


     > How are you checking the classes?


     To detect runtime generated code, we have javaagent that is hooked
     statically to the test suite execution. It gives us all classes
     that that is loaded post the JVM and the javaagent are loaded. So
     we only check the classes loaded for the purpose of running the
     application. This is also why we did not choose -agentlib as it
     would give classes for the setting up JVM and javaagent and we the
     user of our tool must the classes they load.


     Next, we have a `ClassFileTransformer` hook in the agent where we
     produce the checksum using the bytecode. And we compare the
     checksum with the one existing in the allowlist. The checksum
     computation algorithm is same for both steps. Let me describe how
     I compute the checksum.


      1. I get the CONSTANT_Class_info
         <https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.1 
<https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.1>> entry corresponding to 
`this_class` and rewrite the CONSTANT_Utf8_info 
<https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.7 
<https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.7>> corresponding to a fix 
String constant, say "foo".
      2. Since, the name of the class is used to refer to its types
         members (fields/method), I get all CONSTANT_Fieldref_info
         <https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.2 
<https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.2>> and if its 
`class_index` corresponds to the old `this_class`, we rewrite the UTF8 value of class_index to the 
same constant "foo".
      3. Next, since the naming of the fields, in Proxy classes, are
         also suffixed by numbers, for example, `private static Method
         m4`, we rewrite the UTF8 value of name in the
         CONSTANT_NameAndType_info
         <https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.6 
<https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.6>>.
      4. These fields can also have a random order so we simply sort
         the entire byte code using `Arrays.sort(byte[])` to eliminate
         any differences due to ordering of fields/methods.
      5. Simply sorting the byte array still had minute differences. I
         could not understand why they existed even though values in
         constant pool of the bytecode in allowlist and at runtime were
         exactly the same after rewriting. The differences existed in
         the bytes of the Code attribute of methods. I concluded that
         the bytes stored some position information. To avoid this, I
         created a subarray where I considered the bytes corresponding
         to `CONSTANT_Utf8_info.bytes` only. Computing a checksum for
         it resulted in the same checksums for both classfiles.

     Let's understand the whole approach with an example of Proxy class.

     `
     public  final  class  $Proxy42  extends  Proxy  implements  
org.apache.logging.log4j.core.config.plugins.Plugin  {
     `

     The will go in the allowlist as "Proxy_Plugin: <SHA256 checksum>".

     When the same class is intercepted at runtime, say "$Proxy10", we
     look for "Proxy_Plugin" in the allowlist and since the checksum
     algorithm is same in both cases, we get a match and let the class
     load.

     This approach has seemed to work well for Proxy classes, Generated
     Constructor Accessor (which is removed as you said). I also looked
     at the species generated by method handles. I did not notice any
     modification in them. Their name generation seemed okay to me. If
     some new Species are generated, it is of course detected since it
     is not in the allowlist.

     I have not looked into LambdaMetafactory because I did not
     encounter it as a problem so far, but I am aware its name
     generation is also unstable. I have run my approach only a few
     projects only. And for hidden classes, I assume the the agent
     won't be able to intercept them so detecting them would be really
     hard.


     Regards,
     Aman Sharma

     PhD Student
     KTH Royal Institute of Technology
     School of Electrical Engineering and Computer Science (EECS)
     Department of Theoretical Computer Science (TCS)
     <https://www.kth.se/profile/amansha
<https://www.kth.se/profile/amansha>>https://algomaster99.github.io/ <https://algomaster99.github.io/ <https://algomaster99.github.io/>>
     ------------------------------------------------------------------------
     *From:* liangchenb...@gmail.com <mailto:liangchenb...@gmail.com 
<mailto:liangchenb...@gmail.com>>
     <liangchenb...@gmail.com> <mailto:liangchenb...@gmail.com 
<mailto:liangchenb...@gmail.com>>
     *Sent:* Thursday, May 16, 2024 5:52:03 AM
     *To:* Aman Sharma; core-libs-dev
     *Cc:* Martin Monperrus
     *Subject:* Re: Deterministic naming of subclasses of
     `java/lang/reflect/Proxy`
     Hi Aman,
     I think you meant CVE-2021-42392 instead of 2022.

     For your approach of an "allowlist" for Java runtime, project
     Leyden is looking to generate a static image [1], that
     > At run time it cannot load classes from outside the image, nor
     can it create classes dynamically.
     Leyden mainly avoids this unstable generation by performing a
     training run to collect classes loaded and even object graphs; I
     am not familiar with the details unfortunately.

     Otherwise, the Proxy discussion belongs better to core-libs-dev,
     as java.lang.reflect.Proxy is part of Java's core libraries. I am
     replying this thread to core-libs-dev.

     For your perceived problem that classes don't have unique names,
     your description sounds dubious: GeneratedConstructorAccessor is
     already retired by JEP 416 [2] in Java 18, and there are many
     other cases in which JDK generates classes without stable names,
     notoriously LambdaMetafactory (Gradle wished for cacheable
     Lambdas); the same applies for the generated classes for
     MethodHandle's LambdaForms (which carries implementation code for
     LambdaForm). How are you checking the classes? It seems you are
     not checking hidden classes. Proxy and Lambda classes are defined
     by the caller's class loader, while LambdaForms are under JDK's
     system class loader I think. We need to ensure you are correctly
     finding all unstable classes before we can proceed.

     [1]: https://openjdk.org/projects/leyden/notes/01-beginnings
<https://openjdk.org/projects/leyden/notes/01-beginnings>
     <https://openjdk.org/projects/leyden/notes/01-beginnings
<https://openjdk.org/projects/leyden/notes/01-beginnings>>
     [2]: https://openjdk.org/jeps/416 <https://openjdk.org/jeps/416>
<https://openjdk.org/jeps/416 <https://openjdk.org/jeps/416>>

     On Wed, May 15, 2024 at 7:00 PM Aman Sharma <aman...@kth.se
     <mailto:aman...@kth.se <mailto:aman...@kth.se>>> wrote:

         Hi,


         My name is Aman and I am a PhD student at KTH Royal Institute
         of Technology, Stockholm, Sweden. I research as part of CHAINS
         <https://chains.proj.kth.se/ <https://chains.proj.kth.se/>> project to
strengthen the
         software supply chain of multiple ecosystem. I particularly
         focus on runtime integrity in Java. In this email, I want to
         write about an issue I have discovered with /dynamic
         generation of `java.lang.reflect.Proxy`classes/. I will
         propose a solution and would love to hear the feedback from
         the community. Let me know if this is the correct mailing-list
         for such discussions. It seemed the most relevant from this
         list <https://mail.openjdk.org/mailman/listinfo
<https://mail.openjdk.org/mailman/listinfo>>.


         *My research*

         *
         *

         Java has features to load class on the fly - it can either
         download or generate a class at runtime. These features are
         useful for inner workings of JDK. For example, implementing
         annotations, reflective access, etc. However, these features
         have also contributed to critical vulnerabilities in the past
         - CVE-2021-44228  (log4shell), CVE-2022-33980, CVE-2022-42392.
         All of these vulnerabilities have one thing in common - /a
         class that was not known during build time was
         downloaded/generated at runtime and loaded into JVM./


         To defend against such vulnerabilities, we propose a solution
         to /allowlist classes for runtime/. This allowlist will
         contain an exhaustive list of classes that can be loaded by
         the JVM and it will be enforced at runtime. We build this
         allowlist from three sources:

          1. All classes of all modules provided by the Java Standard
             Library. We use ClassGraph
             <https://github.com/classgraph/classgraph
<https://github.com/classgraph/classgraph>> to scan the JDK.
          2. We can take the source code and all dependencies of an
             application. We use a software bill of materials to get
             all the data.
          3. Finally, we use run the test suite to include any runtime
             downloaded/generated classes.
         Such a list is able to prevent the above 3 CVEs because it
         does not let the "unknown" bytecode to be loaded.

         *Problem with generating such an allowlist*
         *
         *
         The first two parts of the allowlist are easy to get. The
         problem is with the third step where we want to allowlist all
         the classes that could be downloaded or generated. Upon
         running the test suite and hooking to the classes it loads, we
         observer that the list consists of classes that are called
         "com/sun/proxy/$Proxy2",
         "jdk/internal/reflect/GeneratedConstructorAccessor3" among
         many more. The purpose of these classes can be identifed. The
         proxy class is created for to implement an annotation. The
         accessor gives access to constructor of a class to the JVM.

         When enforcing this allowlist at runtime, we see that the
         bytecode content for "com/sun/proxy/$Proxy2" differs in the
         allowlist and at runtime. In our case, we we are experimenting
         with pdfbox <https://github.com/apache/pdfbox <https://github.com/apache/pdfbox>> so
we created
         the allowlist using its test suite. Then we enforced this
         allowlist while running some of its subcommands. However,
         there was some other proxy class say "com/sun/proxy/$Proxy5"
         at runtime that implemented the same interfaces and had the
         same methods as "com/sun/proxy/$Proxy2" in the allowlist. They
         only differed in the name of the class, order of fields, and
         types for fields references. This could happen because the
         order of the loading of class is workload dependent, but it
         causes problem to generate such an allowlist.

         *Solution
         *


         We propose that naming of subclasses of
         "java/lang/reflect/Proxy" should not be dependent upon the
         order of loading. In order to do so, two issues can be fixed:

          1. The naming of the class should not be based on AtomicLong
             
<https://github.com/openjdk/jdk/blob/b687aa550837830b38f0f0faa69c353b1e85219c/src/java.base/share/classes/java/lang/reflect/Proxy.java#L531
 
<https://github.com/openjdk/jdk/blob/b687aa550837830b38f0f0faa69c353b1e85219c/src/java.base/share/classes/java/lang/reflect/Proxy.java#L531>>.
 Rather it could be named based on the interfaces it implements. I also wonder why 
AtomicLong is chosen in the first place.
          2. Methods of the interfaces must be in a particular order.
             Right now, they are not sorted in any particular order
             
<https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Class.java#L2178
 
<https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Class.java#L2178>>.


         These fixes will make proxy class generation deterministic
         with respect to order of loading and won't be flagged at
         runtime since the test suite would already detect them.

         I would love to hear from the community about these ideas. If
         in agreement, I would be happy to produce a patch. I have
         discovered this issue with subclasses of
         GeneratedConstructorAccessor
         
<https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/jdk/internal/reflect/ConstructorAccessor.java
 
<https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/jdk/internal/reflect/ConstructorAccessor.java>>
 as well and I imagine it will also apply to some other runtime generated classes. If you 
disagree, please let me know also. It helps with my research.

         I also have PoCs for the above CVEs
         <https://github.com/chains-project/exploits-for-sbom.exe
<https://github.com/chains-project/exploits-for-sbom.exe>> and
         a proof concept tool is being developed under the name
         sbom.exe <https://github.com/chains-project/sbom.exe
<https://github.com/chains-project/sbom.exe>> in case
         any one wonders about the implementation. I would also be
         happy to explain more.

         Regards,
         Aman Sharma

         PhD Student
         KTH Royal Institute of Technology
         School of Electrical Engineering and Computer Science (EECS)
         Department of Theoretical Computer Science (TCS)
         <https://www.kth.se/profile/amansha
<https://www.kth.se/profile/amansha>>https://algomaster99.github.io/ <https://algomaster99.github.io/ <https://algomaster99.github.io/>>




Reply via email to