Interestingly it appears to be re-directing to a URL that looks like it has a video ID on it. I haven't used vimeo, but is it possible it was uploaded but not yet made public?
-Jason On Thu, Aug 14, 2014 at 12:57 PM, Timothy Chen <[email protected]> wrote: > Hi Jim, > > The vimeo link gave me 404 though. > > Tim > > On Thu, Aug 14, 2014 at 12:31 PM, Jim Scott <[email protected]> wrote: > > I recently gave a presentation on how to use Apache Drill with some > > demonstrations. > > > > The video for the presentation is available here: > > http://vimeo.com/chug/using-apache-drill > > The slides are available here: > > > http://www.slideshare.net/ChicagoHUG/using-apache-drill-chug-august-2014-jim-scott > > > > *The data used in this presentation can be found and downloaded (1.3GB > CSV > > file and a 100KB CSV file) * > > *here:* > > > https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2 > > https://data.cityofchicago.org/Public-Safety/Sex-Offenders/vc9r-bqvy > > > > Once you start your drill session you can connect to it via: > > http://localhost:8047/ > > > > > > *Queries (pay attention to back ticks) * > > To see a full stack trace when errors occur: > > alter session set `exec.errors.verbose`=true; > > > > > > *Looking through the information schema:*SELECT * from > > INFORMATION_SCHEMA.SCHEMATA; > > SELECT * from INFORMATION_SCHEMA.`TABLES`; > > SELECT * from INFORMATION_SCHEMA.`COLUMNS`; > > SELECT COLUMN_NAME, DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE > > TABLE_SCHEMA = 'sys' and TABLE_NAME = 'options'; > > > > > > *How many of each type of incident:*select count(*) as incidents, > > columns[5] as category from > > dfs.`/home/jscott/Downloads/chicago/Crimes_-_2001_to_present.csv` group > by > > columns[5] order by incidents desc limit 10; > > > > > > *How many of each type of incident and subtype:*select count(*) as > > incidents, columns[5] as type, columns[6] as subtype from > > dfs.`/home/jscott/Downloads/chicago/Crimes_-_2001_to_present.csv` group > by > > columns[6], columns[5] order by incidents desc limit 10; > > > > > > *CREATE A TABLE in PARQUET format:* > > > > *Please note that I created a workspace "db" at my path of /opt/drill/db > > and I gave it a default storage format of parquet. * > > > > create table dfs.db.`crime_data` as select columns[0] as REFID, > columns[1] > > as CaseNumber, columns[2] as DateTime, columns[3] as Block, columns[4] as > > IUCR, columns[5] as PrimaryType, columns[6] as Description, columns[7] as > > LocationDescription, columns[8] as Arrest, columns[9] as Domestic, > > columns[10] as Beat, columns[11] as District, columns[12] as Ward, > > columns[13] as CommunityArea, columns[14] as FBICode, columns[15] as > > XCoordinate, columns[16] as YCoordinate, columns[17] as YEAR_DATE, > > columns[18] as UpdatedOn, columns[19] as Latitude, columns[20] as > > Longitude, columns[21] as Location from > > dfs.`/home/jscott/Downloads/chicago/Crimes_-_2001_to_present.csv`; > > create table dfs.db.`sex_offenders` as select columns[0] as > LAST,columns[1] > > as FIRST,columns[2] as BLOCK,columns[3] as GENDER,columns[4] as > > RACE,columns[5] as BIRTHDATE,columns[6] as AGE,columns[7] as > > HEIGHT,columns[8] as WEIGHT,columns[9] as VICTIM_MINOR from > > dfs.`/home/jscott/Downloads/chicago/Sex_Offenders.csv`; > > > > > > *Run the previous query on the newly created table:*select count(*) as > > incidents, primarytype, description from dfs.db.`crime_data` group by > > description, primarytype order by incidents desc limit 10; > > > > > > *How many crimes occurred on a block with a registered sex > offender?*SELECT > > count(*) as incidents FROM dfs.db.`crime_data` crimes JOIN > > dfs.db.`sex_offenders` offenders ON crimes.block = offenders.block ORDER > BY > > incidents DESC; > > > > > > *What types of incidents occurred?*SELECT count(*) as incidents, > > crimes.primarytype as type, crimes.description as subtype FROM > > dfs.db.`crime_data` crimes JOIN dfs.db.`sex_offenders` offenders ON > > crimes.block = offenders.block GROUP BY crimes.description, > > crimes.primarytype ORDER BY incidents DESC LIMIT 20; > > > > > > *What all is in the BATTERY category?*SELECT count(*) as incidents, > > crimes.description as subtype FROM dfs.db.`crime_data` crimes JOIN > > dfs.db.`sex_offenders` offenders ON crimes.block = offenders.block WHERE > > crimes.primarytype = 'BATTERY' GROUP BY crimes.description ORDER BY > > incidents DESC; > > > > > > *How many total BATTERY incidents?*SELECT count(*) as incidents FROM > > dfs.db.`crime_data` crimes JOIN dfs.db.`sex_offenders` offenders ON > > crimes.block = offenders.block WHERE crimes.primarytype = 'BATTERY'; > > > > > > *How many "BATTERY" incidents occurred on those blocks by year (has it > > gotten better or worse over time)?*SELECT count(*) as incidents, > > extract(year from to_date(crimes.datetime, 'MM/DD/YYYY hh:mm:ss a')) as > > `year` FROM dfs.db.`crime_data` crimes JOIN dfs.db.`sex_offenders` > > offenders ON crimes.block = offenders.block WHERE crimes.primarytype = > > 'BATTERY' and crimes.datetime not like '%Date%' GROUP BY extract(year > from > > to_date(crimes.datetime, 'MM/DD/YYYY hh:mm:ss a')) ORDER BY `year` ASC; > > > > > > Do you have queries against this data set that you have written that you > > would like to share with others? Reply to the thread... > > > > > > -- > > *Jim Scott* > > Director, Enterprise Strategy & Architecture > > > > <http://www.mapr.com/> > > [image: MapR Technologies] <http://www.mapr.com> >
