In theory, as long as the 30 million records don't push the DB over the 2G file 
size limit, you should be able to do the same thing in MS Access. 



-----Original Message-----
From: [email protected] [mailto:[email protected]] On 
Behalf Of Kurt Buff
Sent: Tuesday, June 28, 2016 3:40 PM
To: ntsysadm <[email protected]>
Subject: Re: [NTSysADM] Compare two large lists

That is a great answer.

Saving this one - I'm not a SQL guy either.

Kurt

On Tue, Jun 28, 2016 at 12:02 PM, Frank Ress <[email protected]> 
wrote:
> I’m a SQL guy (SQL Server of late).  You can download and install the 
> Developer Edition – it’s free.  You really don’t need any of the 
> add-on services (Analysis Services, Reporting Services, Integration 
> Services…).
> Just install the database itself and the management tools.  
> Bing/Microsoft will help answer any questions you might have regarding 
> installation.
>
>
>
> Once it’s running, open Management Studio.  Connect to your new 
> instance and expand the navigator pane to see the databases.  You’ll 
> have 4 system databases by default – SYSTEM, MODEL, MSDB, and TEMP.
>
>
>
> Create a new database (right-click the ‘Databases’ node, New Database, etc.
> – defaults for file names and locations will be fine).  Name it 
> whatever you’d like.  Once the database is created, right-click that 
> database in the navigator pane and pick ‘Tasks’/Import Data…  Assuming 
> you have the hashes in a spreadsheet or whatever, just import both 
> lists each into its own table using the import wizard.  Name the 
> tables whatever you’d like, e.g. BigList and SmallList.  You’ll also 
> give the columns with the data a name (let’s assume you have no other columns 
> of info for each table, just the hashes).
> You can give them the same name, but it’s easier if they’re unique.  
> Call them BigListHash and SmallListHash, for example.
>
>
>
> Once the tables are created and populated, right-click your database 
> in the navigator again and select ‘New Query’.  A new editing pane 
> will open to the right of the navigator.  In the query pane, enter:
>
>
>
> SELECT DISTINCT SmallListHash
>
>   FROM SmallList
>
> INNER JOIN BigList ON SmallListHash = BigListHash
>
>
>
> You don’t need the ‘DISTINCT’ operator if there are no duplicate hash 
> values in your lists.  The query would perform better without it, but 
> using it will eliminate any dups that exist in the data.  Other than 
> speed, can’t hurt to have it.
>
>
>
> There are buttons on the toolbar that will let you export the results 
> to a text file, if you’d like.
>
>
>
> HTH
>
>
>
> Frank Ress
>
> Gas Technology Institute
>
>
>
> From: [email protected] 
> [mailto:[email protected]]
> On Behalf Of Richard Stovall
> Sent: Tuesday, June 28, 2016 1:03 PM
> To: [email protected]
> Subject: [NTSysADM] Compare two large lists
>
>
>
> Not necessarily Windows-related.
>
>
>
> I need to compare a list of about 300,000 file hashes against a larger 
> list of ~30,000,000 and find ones that are represented in both data sets.
>
>
>
> I'm not a database guy, nor have I ever played one on TeeVee.
>
>
>
> Any ideas about how to go about this with standard/free tools in 
> Windows or Linux?
>
>
>
> TIA,
>
> RS
>
>
> ________________________________
>
> This communication is for the use of the intended recipient only. It 
> may contain information that is privileged and confidential. If you 
> are not the intended recipient of this communication, the disclosure, 
> copying, distribution or use hereof is prohibited. If you have 
> received this communication in error, please advise me by return 
> e-mail or by telephone and then delete it immediately.




-----------------------------------------
This message, and any attachments to it, may contain information that is 
privileged, confidential, and exempt from disclosure under applicable law.  If 
the reader of this message is not the intended recipient, you are notified that 
any use, dissemination, distribution, copying, or communication of this message 
is strictly prohibited.  If you have received this message in error, please 
notify the sender immediately by return e-mail and delete the message and any 
attachments.  Thank you.

Reply via email to