As a bad photographer with several forays into the forensic world, I have
a couple of comments on a recent (and pretty interesting!) Black Hat
presentation by Neal Krawetz (www.hackerfactor.com) on image forensics:

  http://blog.wired.com/27bstroke6/files/bh-usa-07-krawetz.pdf

To make things clear: I liked it. I think it's solid. I don't want this to
be picked up by the press and turned into a food fight. I respect the
author. My point is to express my slight doubts regarding several of the
far-fetched conclusions presented later in the talk -before the approach
is relied upon to fire someone from his post or the like.

First things first: in the presentation, following an overview of some of
the most rudimentary "manual" anslysis techniques, Mr.  Krawetz employs
several mathametical transformations as a method to more accurately detect
image tampering. This is based on a valid high-level premise: when lossy
formats are repeatedly edited and recompressed, the quality of various
portions of the image will proportionally degrade. If the image is
composed from a couple of previously lossy compressed files from various
sources, their compression degradation patterns may differ - and the
current level of degradation can be quantified, in the most rudimentary
way simply by measuring how each compression unit (with JPEG, an 8x8px
cell) changes with further compression - which is a nonlinear process.

The property that makes this possible is known to all photographers - the
progressive degradation is the main reason why professional and "prosumer"
photo editing and delivery is done almost exclusively using
storage-extensive lossless formats, and why SLR cameras support RAW / TIFF
output (and why skilled image forgers would not use lossy formats until
they're done, or if forced to, would rescale their work and add subtle
noise to thwart analysis). I'm pretty sure the approach is used as one of
the inputs by commercial image forensics software, too - along with a
couple of other tricks, such as similarity testing to spot the use of
clone tool.

Now, to the point: the "wow" factor associated with the presentation and
picked up by the press comes from a claim about an apparent heavy
manipulation of certain publicly released pictures of Al Qaeda associates,
as a proof of the accuracy and reliability of the automated approach - and
that's where I'm not really so sure about the conclusions reached.

In essence, my issue with this is that the presentation fails to
acknowledge that observed patterns do not necessarily depend on the number
of saves alone. There are certain very common factors that play a far more
pronounced role - and in fact, some of them seem to offer a *better*
explanation of some of the artifacts observed. The two most important
ones:

  - Non-uniform subsampling: JPEG and MPEG typically employ 4:2:0 chroma
    subsampling. This means that a region where a contrast between objects
    is primarily a product of color changes (at comparable intensity of
    reflected light) may appear to be "older" (already lower frequency &
    contrast, producing less pronounced error difference patterns)
    compared to a region where the same level of contrast can be attributed to
    luminosity changes alone. Consider this example:

    http://lcamtuf.coredump.cx/subsampling.png

    ...we then compress it as a JPEG:

    http://lcamtuf.coredump.cx/subsampling.jpg

    ...and can compare the level of compression-related degradation by
    converting it to cyan-weighted BW:

    http://lcamtuf.coredump.cx/subsampling_bw.png

    I attempted to recreate the RGB "error difference" approach of Mr.
    Krawetz, resaving it again at a slightly different compression level,
    and came up with this image, which seems to suggest that only the top
    text is brand new (comparing this to the conclusions reached for
    various TV frame grabs later in his presentation, where similar
    differences in color and contrast were resolved in favor of
    manipulation):

    http://lcamtuf.coredump.cx/subsampling_nk.jpg

    Simply picking out Y component does not help either - since the
    working space of the editor is inevitably RGB, each resave causes
    Cb and Cr resampling imprecision to spill to Y on YCbCr -> RGB ->
    YCbCr conversions, and introduce errors comparable to what we're
    trying to detect.

  - Quantization. JPEG quality is controlled primarily by the accuracy
    of image quantization step that discards differences in many
    high-frequency 8x8 patterns, while generally preserving low-frequency
    ones, but possibly introducing higher-frequency artifacts around
    more complex shapes, subject to rapid degradation. A good example of
    this is the following picture:

    http://blog.wired.com/photos/uncategorized/2007/08/01/ayman_alzawahiri.jpg
    
http://blog.wired.com/photos/uncategorized/2007/08/01/ayman_alzawahiri_analysis.jpg

    Krawetz attributes the outline around al-Zawahiri seen on the second
    picture to chroma key manipulation, but fails to address the fact that
    the  high-contrast, low-frequency edge between al-Zawahiri's black
    scarf against his white clothing produced an identical artifact. I
    highly doubt the scarf was altered, and Krawetz makes no such
    assumption when tracing the original image later on. It's still
    perfectly possible that this picture was manipulated (and a visual
    inspection of a thin black outline around his body may confirm this),
    but Krawetz's analysis does not strike me as solid evidence of such 
tampering
    (particularly with the banner, as suggested by Krawetz in an
    interview).

    To test for this, I took my own photo with a couple of contrasty
    areas (most certainly not a collage) and subjected it to the error
    difference treatment:

    http://lcamtuf.coredump.cx/photo/current/ula3-000.jpg
    http://lcamtuf.coredump.cx/ula3-000_nk.jpg

    Now, if you interpret the output in line with what we see on page 62
    of the presentation, one should assume that the background in the
    top-right part of the image predates the model, and much of the
    phone and some of her nails postdate her.

There's also a list of other problems with the approach that may cause it
to fail in specific circumstances... non-square chroma subsamling in
certain video formats and JPEG encoders would make regions with dominant
high-frequency vertical chrominance contrast patterns degrade at a rate
different from ones with dominant horizontal patterns, especially when
resaved in 4:2:0... digtal cameras produce non-linear noise, remarkably
more pronounced at the bottom part of the dynamic range - which may cause
dark areas to behave in a significantly different manner when reaching
"stability" on subsequent recompressions, etc.

I think the point I'm trying to make is this: it's a good idea to rely on
the manual approaches described in this paper. It's also to good to learn
about many of the tools-of-trade not described there, such as pixel-level
noise uniformity analysis, etc.

The ideas proposed for automated analysis, on the other hand, may be good
in some applications, but IMO is going to be hit-and-miss with far too
many false positives to be useful in general-purpose forensics.

/mz

_______________________________________________
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/

Reply via email to