Wow thank you Frank, it was far kind of you to type that all out!

 

Daniel Wolf

 

From: [email protected] [mailto:[email protected]] On 
Behalf Of Frank Ress
Sent: Tuesday, June 28, 2016 2:03 PM
To: [email protected]
Subject: RE: [NTSysADM] Compare two large lists

 

I’m a SQL guy (SQL Server of late).  You can download and install the Developer 
Edition – it’s free.  You really don’t need any of the add-on services 
(Analysis Services, Reporting Services, Integration Services…).  Just install 
the database itself and the management tools.  Bing/Microsoft will help answer 
any questions you might have regarding installation.

 

Once it’s running, open Management Studio.  Connect to your new instance and 
expand the navigator pane to see the databases.  You’ll have 4 system databases 
by default – SYSTEM, MODEL, MSDB, and TEMP.

 

Create a new database (right-click the ‘Databases’ node, New Database, etc. – 
defaults for file names and locations will be fine).  Name it whatever you’d 
like.  Once the database is created, right-click that database in the navigator 
pane and pick ‘Tasks’/Import Data…  Assuming you have the hashes in a 
spreadsheet or whatever, just import both lists each into its own table using 
the import wizard.  Name the tables whatever you’d like, e.g. BigList and 
SmallList.  You’ll also give the columns with the data a name (let’s assume you 
have no other columns of info for each table, just the hashes).  You can give 
them the same name, but it’s easier if they’re unique.  Call them BigListHash 
and SmallListHash, for example.

 

Once the tables are created and populated, right-click your database in the 
navigator again and select ‘New Query’.  A new editing pane will open to the 
right of the navigator.  In the query pane, enter:

 

SELECT DISTINCT SmallListHash 

  FROM SmallList

INNER JOIN BigList ON SmallListHash = BigListHash

 

You don’t need the ‘DISTINCT’ operator if there are no duplicate hash values in 
your lists.  The query would perform better without it, but using it will 
eliminate any dups that exist in the data.  Other than speed, can’t hurt to 
have it.

 

There are buttons on the toolbar that will let you export the results to a text 
file, if you’d like.

 

HTH

 

Frank Ress

Gas Technology Institute

 

From: [email protected] <mailto:[email protected]>  
[mailto:[email protected]] On Behalf Of Richard Stovall
Sent: Tuesday, June 28, 2016 1:03 PM
To: [email protected] <mailto:[email protected]> 
Subject: [NTSysADM] Compare two large lists

 

Not necessarily Windows-related.

 

I need to compare a list of about 300,000 file hashes against a larger list of 
~30,000,000 and find ones that are represented in both data sets.

 

I'm not a database guy, nor have I ever played one on TeeVee.

 

Any ideas about how to go about this with standard/free tools in Windows or 
Linux?

 

TIA,

RS

 

  _____  


This communication is for the use of the intended recipient only. It may 
contain information that is privileged and confidential. If you are not the 
intended recipient of this communication, the disclosure, copying, distribution 
or use hereof is prohibited. If you have received this communication in error, 
please advise me by return e-mail or by telephone and then delete it 
immediately.


Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to