[java code coverage] Article about JaCoCo Class Ids

Marc R. Hoffmann Fri, 14 Mar 2014 08:29:24 -0700

Hi,

as we discussed JaCoCo class ids many times here, I added an new chapter 
about this topic to the documentation: 
http://www.eclemma.org/jacoco/trunk/doc/classids.html


Cheers,
-marc

Class Ids

As JaCoCo's class identifiers are sometimes causing confusion this chapter 
answers the concepts and common issues with class ids in FAQ style format.
What are class ids and how are they created?

Class ids are 64-bit integer values, for example 0x638e104737889183 in hex 
notation. Their calculation is considered an implementation detail of 
JaCoCo. Currently ids are created with a CRC64 checksum of the raw class 
file.
What are class ids used for?

Class ids are used to unambiguously identify Java classes. At runtime 
execution data is sampled for every loaded class and typically stored to 
*.exec files. At analysis time — for example for report generation — the 
class ids are used to relate analyzed classes with the execution data.
What are the advantages of JaCoCo class ids?

The concept of class ids allows distinguishing different versions of 
classes, for example when multiple versions of an application are deployed 
to an application server or different versions of libraries are included.

Also class ids are the prerequisite for JaCoCo's minimal runtime-overhead 
and small *.exec files even for very large applications under test.
What is the disadvantage of JaCoCo class ids?

The fact that class ids identify a specific version of a class causes 
problems in setups where different classes are used at runtime and at 
analysis time.
What happens if different classes are used at runtime and at analysis time?

In this case execution data cannot be related to the analyzed classes. As a 
consequence such classes are reported with 0% coverage.
How can I detect that I have a problem with class ids?

The typical symptom of class id mismatch is classes not shown as covered 
although they have been executed during the test. This situation can be 
easily detected e.g. in the HTML report: Open the *Sessions* page with the 
link on the top-right corner. You see a list of all classes where execution 
data has been collected for. Find the class in questions and check whether 
the entry has a link to the corresponding coverage report page. If the 
entry is not linked this means there is a class id mismatch between the 
class used at runtime and the class provided to create the report.
What can cause different class ids?

Class ids are identical for the exact same class file only (byte-by-byte). 
There is a couple of reasons why you might get different class files. First 
compiling Java source files will result in different class files if you use 
a different tool chain:

   - Different compiler vendor (e.g. Eclipse vs. Oracle JDK)
   - Different compiler versions
   - Different compiler settings (e.g. debug vs. non-debug)

Also post-processing class files (obfuscation, AspectJ, etc.) will 
typically change the class files. JaCoCo will work well if you simply use 
the same class files for runtime as well as for analysis. So the tool chain 
to create these class files does not matter.

Even if the class files on the file system are the same there is possible 
that classes seen by the JaCoCo runtime agent are different anyways. This 
typically happens when another Java agent is configured *before* the JaCoCo 
agent or special class loaders pre-process the class files. Typical 
candidates are:

   - Mocking frameworks
   - Application servers
   - Persistence frameworks

What workarounds exist to deal with runtime-modified classes?

If classes get modified at runtime in your setup there are some workarounds 
to make JaCoCo work anyways:

   - If you use another Java agent make sure the JaCoCo 
agent<http://www.eclemma.org/jacoco/trunk/doc/agent.html> is 
   specified at first in the command line. This way the JaCoCo agent should 
   see the original class files.
   - Specify the classdumpdir option of the JaCoCo 
agent<http://www.eclemma.org/jacoco/trunk/doc/agent.html> and 
   use the dumped classes at report generation. Note that only loaded classes 
   will be dumped, i.e. classes not executed at all will not show-up in your 
   report as not covered.
   - Use offline 
instrumentation<http://www.eclemma.org/jacoco/trunk/doc/offline.html> before 
   you run your tests. This way classes get instrumented by JaCoCo before any 
   runtime modification can take place. Note that in this case the report has 
   to be generated with the *original* classes, not with instrumented ones.

Why can't JaCoCo simply use the class name to identify classes?

To understand why JaCoCo can't rely on class names we need to have a look 
at the way how JaCoCo measures code coverage.

JaCoCo tracks execution with so called *probes*. Probes are additional byte 
code instructions inserted in the original class file which will note when 
they are executed and report this to the JaCoCo runtime. This process is 
called *instrumentation*. To keep the runtime overhead minimal, only a few 
probes are inserted at "strategic" places. These probe positions are 
determined by analyzing the control 
flow<http://www.eclemma.org/jacoco/trunk/doc/flow.html> of 
all methods of a class. As a result every instrumented class produces a 
list of n boolean flags indicating whether the probe has been executed or 
not. A JaCoCo *.exec file simply stores a boolean array per class id.

At analysis time, for example for report generation, the *.exec file is 
used to get information about probe execution status. But as probes are 
stored in a plain boolean array there is no information like corresponding 
methods or lines. To retrieve this information we need the original class 
files and perform the exact same control flow analysis than at 
instrumentation time. Because this is a deterministic process we get the 
same probe positions. With this information we can now interfere the 
execution status of every single instruction and branch of a method. Using 
the debug information embedded in the class files we can also calculate 
line coverage.

If we would use just slightly different classes at analysis time than at 
runtime — e.g. different method ordering or additional branches — we would 
end-up with different probes. For example the probe at index i would be in 
method a() and not in method *b()*. Obviously this will create random 
coverage results.
Why do I get an error when I try to analyze multiple versions of the same 
class with a group?

JaCoCo always analyzes a set of class as a group. The group is used to 
aggregate data for source files and packages (both can contain multiple 
classes). Within the reporting API classes are identified by their fully 
qualified name (e.g. to create stable file names in the HTML reports). 
Therefore it is not possible to include two different classes with the same 
name within a group. Anyhow it is possible to analyze different versions of 
class files in separate groups, for example the Ant report 
task<http://www.eclemma.org/jacoco/trunk/doc/ant.html#report> can 
be configured with multiple groups.

-- 
You received this message because you are subscribed to the Google Groups 
"JaCoCo and EclEmma Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[java code coverage] Article about JaCoCo Class Ids

Reply via email to