That is a great answer.

Saving this one - I'm not a SQL guy either.

Kurt

On Tue, Jun 28, 2016 at 12:02 PM, Frank Ress
<[email protected]> wrote:
> I’m a SQL guy (SQL Server of late).  You can download and install the
> Developer Edition – it’s free.  You really don’t need any of the add-on
> services (Analysis Services, Reporting Services, Integration Services…).
> Just install the database itself and the management tools.  Bing/Microsoft
> will help answer any questions you might have regarding installation.
>
>
>
> Once it’s running, open Management Studio.  Connect to your new instance and
> expand the navigator pane to see the databases.  You’ll have 4 system
> databases by default – SYSTEM, MODEL, MSDB, and TEMP.
>
>
>
> Create a new database (right-click the ‘Databases’ node, New Database, etc.
> – defaults for file names and locations will be fine).  Name it whatever
> you’d like.  Once the database is created, right-click that database in the
> navigator pane and pick ‘Tasks’/Import Data…  Assuming you have the hashes
> in a spreadsheet or whatever, just import both lists each into its own table
> using the import wizard.  Name the tables whatever you’d like, e.g. BigList
> and SmallList.  You’ll also give the columns with the data a name (let’s
> assume you have no other columns of info for each table, just the hashes).
> You can give them the same name, but it’s easier if they’re unique.  Call
> them BigListHash and SmallListHash, for example.
>
>
>
> Once the tables are created and populated, right-click your database in the
> navigator again and select ‘New Query’.  A new editing pane will open to the
> right of the navigator.  In the query pane, enter:
>
>
>
> SELECT DISTINCT SmallListHash
>
>   FROM SmallList
>
> INNER JOIN BigList ON SmallListHash = BigListHash
>
>
>
> You don’t need the ‘DISTINCT’ operator if there are no duplicate hash values
> in your lists.  The query would perform better without it, but using it will
> eliminate any dups that exist in the data.  Other than speed, can’t hurt to
> have it.
>
>
>
> There are buttons on the toolbar that will let you export the results to a
> text file, if you’d like.
>
>
>
> HTH
>
>
>
> Frank Ress
>
> Gas Technology Institute
>
>
>
> From: [email protected] [mailto:[email protected]]
> On Behalf Of Richard Stovall
> Sent: Tuesday, June 28, 2016 1:03 PM
> To: [email protected]
> Subject: [NTSysADM] Compare two large lists
>
>
>
> Not necessarily Windows-related.
>
>
>
> I need to compare a list of about 300,000 file hashes against a larger list
> of ~30,000,000 and find ones that are represented in both data sets.
>
>
>
> I'm not a database guy, nor have I ever played one on TeeVee.
>
>
>
> Any ideas about how to go about this with standard/free tools in Windows or
> Linux?
>
>
>
> TIA,
>
> RS
>
>
> ________________________________
>
> This communication is for the use of the intended recipient only. It may
> contain information that is privileged and confidential. If you are not the
> intended recipient of this communication, the disclosure, copying,
> distribution or use hereof is prohibited. If you have received this
> communication in error, please advise me by return e-mail or by telephone
> and then delete it immediately.


Reply via email to