Hmm - a bit of google foo and refreshing of chances of collions means my
1:9,223,372,036,854,775,808 is way off

That is the chance of any two hashes colliding - you'll have lot sof hashes.

The basic idea might be sound though even if you stick with a full MD5 it
should save a chunk of storage

Paul
www.sandersonforensics.com
skype: r3scue193
twitter: @sandersonforens
Tel +44 (0)1326 572786
http://sandersonforensics.com/forum/content.php?195-SQLite-Forensic-Toolkit
-Forensic Toolkit for SQLite
email from a work address for a fully functional demo licence

On 1 March 2017 at 22:13, Paul Sanderson <sandersonforens...@gmail.com>
wrote:

> As a bit of an off the wall suggestion you could try an MD5 (or even a
> partial MD5 - half of the bytes)
>
> CREATE table hashes (
>     hash integer primary key; // just the first 64 bits of the hash of
> uniquecol and extracol
> )
>
> as an integer primary key the hash would be an alias of the rowid and so
> storage would be 8 bytes plus admin
>
> the chance of a random colliison based on a 64 bit hash would be (I think)
> 1:9,223,372,036,854,775,808
>
> MD5 is broken but would work OK for this
>
> use a trigger to abort the insert into blah if the insert into hashes
> fails.
>
>
> Paul
> www.sandersonforensics.com
> skype: r3scue193
> twitter: @sandersonforens
> Tel +44 (0)1326 572786 <+44%201326%20572786>
> http://sandersonforensics.com/forum/content.php?195-SQLite-
> Forensic-Toolkit -Forensic Toolkit for SQLite
> email from a work address for a fully functional demo licence
>
> On 1 March 2017 at 21:54, Simon Slavin <slav...@bigfraud.org> wrote:
>
>>
>> On 1 Mar 2017, at 9:41pm, Deon Brewis <de...@outlook.com> wrote:
>>
>> > Yeah ok, but that is paltry compared with the gb's of diskspace that
>> the actual second index takes up. But thanks for clarifying.
>>
>> Ah.  If it’s really GBs of disk space then I can see why you’d look for
>> alternative solutions.
>>
>> But I have a 43 GB database file which could be 20 GB without an extra
>> index.  I could have written that extra check in code, and reduced the file
>> size, but I decided not to.  Because once I had developed procedures to
>> handle a 20 GB file, I might was well be dealing with a 43 GB file anyway.
>>
>> Simon.
>> _______________________________________________
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>
>
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to