Hello,

On 2019-08-01 12:50 a.m., Stephane Chazelas wrote:
2019-07-31 22:36:18 -0500, Peng Yu:

Suppose that I know a md5sum that is derived one of the timestamps
computed below. Is there a way to quickly derive what the original
timestamp is? I could make a database of all the timestamps and their
md5sums. But as the total number of entries increases, this solution
will not be scalable as the database can be big. Is it there any
better solution to this problem?

for i in {1..2563200}; do date -d "-$i minutes" +%Y%m%d_%I%M%p; done
[...]

seq -f '-%g minutes' 2563200 | date -f - +%Y%m%d_%I%M%p

would be an improvement as it would only run one date
invocation, but you'd still need to run one md5sum for each of
those lines. coreutils md5sum in itself is not slow, but forking
a process and loading a command and linking its libraries is,
that's not a bug in coreutils itself.


"datamash" will calculate md5 on multiple lines in one invocation:

   $ seq -f '-%g minutes' 2563200 \
       | date -f - +%Y%m%d_%I%M%p \
       | datamash md5 1

or to see the time AND the md5 sum, add "--full":

   $ seq -f '-%g minutes' 2563200 \
       | date -f - +%Y%m%d_%I%M%p \
       | datamash --full md5 1

Three notes:
1.
I would recommend using "-%7.0f minutes" format in "seq"
instead of "%g", as the latter will result in a scientific notation
for large values:

   $ seq -f '-%7g minutes' 2563200 | tail -n1
   -2.5632e+06 minutes

   $ seq -f '-%7.0f minutes' 2563200 | tail -n1
   -2563200 minutes

2.
Using "-N minutes" as a date format is relative to the current time.
Are you sure that's the value you want? you'll get different values
every time you run it...
To be more reproducible,  consider starting with a known date, e.g.:

   $ date -u  -d "2019-08-01 01:53:22Z +55 minutes" +%Y%m%d_%I%M%p
   20190801_0248AM

or
   $ seq -f "2019-08-01 01:53:22Z +%7.0f minutes" 2563200 \
       | date -u -f - +%Y%m%d_%I%M%p | head
   20190801_0154AM


3.
Using "datamash md5" does not include the newline for the md5
calculation, be careful about this when comparing hashing results.
e.g.:

    $ echo 20190731_0848PM | md5sum
    deb75bda7f8e95d321897d181cbe2556  -

    $ printf "%s\n" 20190731_0848PM | md5sum
    deb75bda7f8e95d321897d181cbe2556  -

    $ printf "%s" 20190731_0848PM | md5sum
    d0bf332197593b7c3f6d7757f7d5754a  -

    $ printf "%s" 20190731_0848PM | datamash md5 1
    d0bf332197593b7c3f6d7757f7d5754a


---

For reference, on my old desktop it takes:

    $ time seq -f "2019-08-01 01:53:22Z +%7.0f minutes" 2563200 \
          | date -u -f - +%Y%m%d_%I%M%p \
          | datamash --full md5 1 | wc -l -c
    2563200 125596800

    real    0m14.185s
    user    0m17.739s
    sys     0m0.527s

And results in ~125MB of data - reasonable for an ad-hoc reverse
lookup table for MD5 values.

If you key space gets larger, you should look into https://en.wikipedia.org/wiki/Rainbow_table .

Hope this helps,
 - assaf

Reply via email to