Hi Jim, The vimeo link gave me 404 though.
Tim On Thu, Aug 14, 2014 at 12:31 PM, Jim Scott <[email protected]> wrote: > I recently gave a presentation on how to use Apache Drill with some > demonstrations. > > The video for the presentation is available here: > http://vimeo.com/chug/using-apache-drill > The slides are available here: > http://www.slideshare.net/ChicagoHUG/using-apache-drill-chug-august-2014-jim-scott > > *The data used in this presentation can be found and downloaded (1.3GB CSV > file and a 100KB CSV file) * > *here:* > https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2 > https://data.cityofchicago.org/Public-Safety/Sex-Offenders/vc9r-bqvy > > Once you start your drill session you can connect to it via: > http://localhost:8047/ > > > *Queries (pay attention to back ticks) * > To see a full stack trace when errors occur: > alter session set `exec.errors.verbose`=true; > > > *Looking through the information schema:*SELECT * from > INFORMATION_SCHEMA.SCHEMATA; > SELECT * from INFORMATION_SCHEMA.`TABLES`; > SELECT * from INFORMATION_SCHEMA.`COLUMNS`; > SELECT COLUMN_NAME, DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE > TABLE_SCHEMA = 'sys' and TABLE_NAME = 'options'; > > > *How many of each type of incident:*select count(*) as incidents, > columns[5] as category from > dfs.`/home/jscott/Downloads/chicago/Crimes_-_2001_to_present.csv` group by > columns[5] order by incidents desc limit 10; > > > *How many of each type of incident and subtype:*select count(*) as > incidents, columns[5] as type, columns[6] as subtype from > dfs.`/home/jscott/Downloads/chicago/Crimes_-_2001_to_present.csv` group by > columns[6], columns[5] order by incidents desc limit 10; > > > *CREATE A TABLE in PARQUET format:* > > *Please note that I created a workspace "db" at my path of /opt/drill/db > and I gave it a default storage format of parquet. * > > create table dfs.db.`crime_data` as select columns[0] as REFID, columns[1] > as CaseNumber, columns[2] as DateTime, columns[3] as Block, columns[4] as > IUCR, columns[5] as PrimaryType, columns[6] as Description, columns[7] as > LocationDescription, columns[8] as Arrest, columns[9] as Domestic, > columns[10] as Beat, columns[11] as District, columns[12] as Ward, > columns[13] as CommunityArea, columns[14] as FBICode, columns[15] as > XCoordinate, columns[16] as YCoordinate, columns[17] as YEAR_DATE, > columns[18] as UpdatedOn, columns[19] as Latitude, columns[20] as > Longitude, columns[21] as Location from > dfs.`/home/jscott/Downloads/chicago/Crimes_-_2001_to_present.csv`; > create table dfs.db.`sex_offenders` as select columns[0] as LAST,columns[1] > as FIRST,columns[2] as BLOCK,columns[3] as GENDER,columns[4] as > RACE,columns[5] as BIRTHDATE,columns[6] as AGE,columns[7] as > HEIGHT,columns[8] as WEIGHT,columns[9] as VICTIM_MINOR from > dfs.`/home/jscott/Downloads/chicago/Sex_Offenders.csv`; > > > *Run the previous query on the newly created table:*select count(*) as > incidents, primarytype, description from dfs.db.`crime_data` group by > description, primarytype order by incidents desc limit 10; > > > *How many crimes occurred on a block with a registered sex offender?*SELECT > count(*) as incidents FROM dfs.db.`crime_data` crimes JOIN > dfs.db.`sex_offenders` offenders ON crimes.block = offenders.block ORDER BY > incidents DESC; > > > *What types of incidents occurred?*SELECT count(*) as incidents, > crimes.primarytype as type, crimes.description as subtype FROM > dfs.db.`crime_data` crimes JOIN dfs.db.`sex_offenders` offenders ON > crimes.block = offenders.block GROUP BY crimes.description, > crimes.primarytype ORDER BY incidents DESC LIMIT 20; > > > *What all is in the BATTERY category?*SELECT count(*) as incidents, > crimes.description as subtype FROM dfs.db.`crime_data` crimes JOIN > dfs.db.`sex_offenders` offenders ON crimes.block = offenders.block WHERE > crimes.primarytype = 'BATTERY' GROUP BY crimes.description ORDER BY > incidents DESC; > > > *How many total BATTERY incidents?*SELECT count(*) as incidents FROM > dfs.db.`crime_data` crimes JOIN dfs.db.`sex_offenders` offenders ON > crimes.block = offenders.block WHERE crimes.primarytype = 'BATTERY'; > > > *How many "BATTERY" incidents occurred on those blocks by year (has it > gotten better or worse over time)?*SELECT count(*) as incidents, > extract(year from to_date(crimes.datetime, 'MM/DD/YYYY hh:mm:ss a')) as > `year` FROM dfs.db.`crime_data` crimes JOIN dfs.db.`sex_offenders` > offenders ON crimes.block = offenders.block WHERE crimes.primarytype = > 'BATTERY' and crimes.datetime not like '%Date%' GROUP BY extract(year from > to_date(crimes.datetime, 'MM/DD/YYYY hh:mm:ss a')) ORDER BY `year` ASC; > > > Do you have queries against this data set that you have written that you > would like to share with others? Reply to the thread... > > > -- > *Jim Scott* > Director, Enterprise Strategy & Architecture > > <http://www.mapr.com/> > [image: MapR Technologies] <http://www.mapr.com>
