That is a great answer. Saving this one - I'm not a SQL guy either.
Kurt On Tue, Jun 28, 2016 at 12:02 PM, Frank Ress <[email protected]> wrote: > I’m a SQL guy (SQL Server of late). You can download and install the > Developer Edition – it’s free. You really don’t need any of the add-on > services (Analysis Services, Reporting Services, Integration Services…). > Just install the database itself and the management tools. Bing/Microsoft > will help answer any questions you might have regarding installation. > > > > Once it’s running, open Management Studio. Connect to your new instance and > expand the navigator pane to see the databases. You’ll have 4 system > databases by default – SYSTEM, MODEL, MSDB, and TEMP. > > > > Create a new database (right-click the ‘Databases’ node, New Database, etc. > – defaults for file names and locations will be fine). Name it whatever > you’d like. Once the database is created, right-click that database in the > navigator pane and pick ‘Tasks’/Import Data… Assuming you have the hashes > in a spreadsheet or whatever, just import both lists each into its own table > using the import wizard. Name the tables whatever you’d like, e.g. BigList > and SmallList. You’ll also give the columns with the data a name (let’s > assume you have no other columns of info for each table, just the hashes). > You can give them the same name, but it’s easier if they’re unique. Call > them BigListHash and SmallListHash, for example. > > > > Once the tables are created and populated, right-click your database in the > navigator again and select ‘New Query’. A new editing pane will open to the > right of the navigator. In the query pane, enter: > > > > SELECT DISTINCT SmallListHash > > FROM SmallList > > INNER JOIN BigList ON SmallListHash = BigListHash > > > > You don’t need the ‘DISTINCT’ operator if there are no duplicate hash values > in your lists. The query would perform better without it, but using it will > eliminate any dups that exist in the data. Other than speed, can’t hurt to > have it. > > > > There are buttons on the toolbar that will let you export the results to a > text file, if you’d like. > > > > HTH > > > > Frank Ress > > Gas Technology Institute > > > > From: [email protected] [mailto:[email protected]] > On Behalf Of Richard Stovall > Sent: Tuesday, June 28, 2016 1:03 PM > To: [email protected] > Subject: [NTSysADM] Compare two large lists > > > > Not necessarily Windows-related. > > > > I need to compare a list of about 300,000 file hashes against a larger list > of ~30,000,000 and find ones that are represented in both data sets. > > > > I'm not a database guy, nor have I ever played one on TeeVee. > > > > Any ideas about how to go about this with standard/free tools in Windows or > Linux? > > > > TIA, > > RS > > > ________________________________ > > This communication is for the use of the intended recipient only. It may > contain information that is privileged and confidential. If you are not the > intended recipient of this communication, the disclosure, copying, > distribution or use hereof is prohibited. If you have received this > communication in error, please advise me by return e-mail or by telephone > and then delete it immediately.

