Forensics of paper docs, by, say, FBI, examined paper constituents
(ecology of trees, soil, tree-cutters, haulers, pulp mill, paper mill,
coating mill, shipping containers, distributors, sellers, buyers,
lenders), inks and their constituents, human and machine handling
and using and transmitting detritus, attempts to camouflage, divert
and hoax.

Forensics of digital docs do all this and much more from creation
to transceiving, forging, hoaxing, tracking, calling home, so forth.

Coupled with the Internet and commodiously ID'd digital processing
devices from manufacturers of programs and devices to the poor
user blocked from seeing the galore of peeping toms, diverted by
promises of privacy and comsec from, sad to say, promoted by
orgs receiving funds from the.manufacturers to play that very user
narcosis role, what can be done?

If a bio-hazard suit promises protection from ecological hazards,
what digital-hazard suit is available not contaminated with data
siphoning of the wearer like products tagged for sale to end of
world believers.

Crypto is trap- and back-doored and corrupt, so it is warned by
those offering an NSA career in the womb of Rosemary's Baby,
privacy is delusionary, so it is preached by those inviting into OTR
communities filled with Google-SM-informants and XXers, openness
worse  deception than official secrecy, so blind-justice visionaries
reveal and beckon to get off the grid and underground deep and
dark far away from the electromagnetic spectrum -- quanta-land,
teleportation nirvana, across rivernet of Styx Stux.

Remember when cpunk seers cautioned commodiously of sinister
authorities and their vilainous contractors, and encouraged heroically
to assassinate them anonymously? Remember the gradual hiring
of those seers to remain in place while aiding and abetting the
authorities as contractors to invent and promise comsec and privacy
and anonymity, generously trap- and back-doored and trojaned and
Call Homed tracing the arc of Snowden and gobs of others requiring
forenics to counter and counter-counter forensics of fora like
this, like Post-Snowden journalism enthralled with the adopting
of secure drop boxes, leak sites, secure comms, PK swaps
and signings, to camouflage long-standing lunches and briefings
with officials to agree on what can be slipped into public
perception of acceptable corruption to hide the unacceptable.

Adobe brags PDFs can simulate paper docs exactly. Indeed,
and much more forensically easy.

At 02:16 AM 2/1/2015, you wrote:
On 1/31/15, Jason McVetta <[email protected]> wrote:
> ...
> For Ubuntu users:
>
> sudo apt-get install libimage-exiftool-perl
> exiftool -a -G1
> adobe-acrobat-xi-scan-paper-to-pdf-and-apply-ocr-tutorial-ue.pdf  | less -S


per the python PDF tools, (with varied options),
 or reduced option command line pdf2txt, or pdftotext, or
   also:

strings --bytes=$varlength ... with varying --encoding= ... , for as
John mentioned, all the metadatas and annotations typically unseen,


consider that the specific "configuration and input parsing" as a
"profile" for a given "input document" identified by "self certifying
identifier" for all of the above results in collaborative simplified
text paragraphs as a working base.

so sha256(generated corpora) == sha256(sha256(doc)  ^ sha256(config of
parse opts) ^ sha256(parse-product) )

if i use a convenient generated slang, ...

this means at least a dozen "to text" engines with configuration,
(parse opts and parse products) per input document as a working state.

and ten to twenty times the input pages as simplified output text
paragraphs (common base) collected from the useful parts of the best
transformations, used for subsequent text based natural language
processing.

in a sense, this is devops come to document processing, where the
process itself is embodied in version controlled and complete archives
with self certifying integrity. this means boring, and also done
decades ago, more or less, in varying contexts. everything old is new
again ;P

there are a whole field of customer parser and data sets and scrapers
all dedicated to variations on this theme, although sadly they don't
live public lives, for the most part.


best regards,


Reply via email to