I'd be curious on the performance of dimension vs observation_fact queries as well. I don't know if I've ever seen hard numbers. It's probably a very index sensitive measurement:).
The dimension table can have some negative implications for things like race, for example. If race is implemented according to the NIH Standard, someone can be White, Black, and Asian, but in the patient dimension, you would be limited to a single concept. But in the observation_fact table, you could easily add three rows, one for each race. Phillip From: Tom Mish <[email protected]<mailto:[email protected]>> Date: Tuesday, February 4, 2014 10:22 AM To: Phillip Reeder <[email protected]<mailto:[email protected]>>, "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: optional columns in i2b2 dimension tables RE: Minutes of GPV-DEV call 20140128 Great discussion all... I'm curious about the reasoning process that led to the decision to add optional columns to the patient dimension table. Mostly, I'm curious about performance... was performance a driving reason for adding the optional columns or was it simply because the option columns met the criteria (as pointed out in the CRC_design.pdf document - v1.6) of: "Each record in the patient_dimension table represents a patient in the database. The table includes demographics fields such as gender, age, race, etc. Most attributes of the patient dimension table are discrete (i.e. Male/Female, Zip code, etc.)." or was it something else that drove this decision? Running a SQL Server backed system as led to some interesting (in a bad way) things happening with the building and sizes of our indices on some of the tables. I'm curious is Oracle based systems are running into the same trouble or is it just those sitting on top of SQL Server... -Tom Mish (UW-Madison) PS: I'd second the idea of standardizing on v1.6 as proposed by Phillip. On 2/4/2014 9:57 AM, Phillip Reeder wrote: My understanding is the same as Dan's with regard to the dimension tables. And with regard to modifiers, It looks like the modifier column was in 1.5, but i2b2 didn't know how to use it. "Introduced in Core i2b2 Version 1.6 In Version 1.6 of i2b2 we begin to use the modifier_cd column in the observation_fact table." https://community.i2b2.org/wiki/display/DevForum/Modifiers+in+i2b2+Data+Model I'd like to propose that we standardize on i2b2 1.6. From what I remember, 1.6 seemed to already be running at the majority of the sites. For those with 1.5 or previous, the data should be able to work in version 1.6, they just won't have modifiers. From there, I think we can start deciding what we want to standardize across sites (Demographics, Diagnoses, etc.) and can start deciding if we want to add/remove columns from various dimensions, query on the fact table only, or just query on dimension tables, etc.. Phillip From: Dan Connolly <[email protected]<mailto:[email protected]>> Date: Monday, February 3, 2014 12:43 PM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: optional columns in i2b2 dimension tables RE: Minutes of GPV-DEV call 20140128 Well, I can only tell you that my reading is borne out by experience with removing optional columns and adding columns of our own. >From >epic_dimensions_load.sql<https://informatics.kumc.edu/work/browser/heron_load/epic_dimensions_load.sql>: alter table NightHerondata.patient_dimension drop (zip_cd) ; ... alter table NightHerondata.patient_dimension add (date_shift number) ; alter table NightHerondata.patient_dimension add (ssn varchar2(45)) ; alter table NightHerondata.patient_dimension add (age_in_years_num_hipaa number) ; alter table NightHerondata.patient_dimension add (birth_date_hipaa date) ; The way I2B2 dynamically builds queries from c_facttablecolumn, c_tablename, c_columnname, c_columndatatype, and c_operator is also suggestive of the reading that says we can add whatever columns we like to the dimension tables. -- Dan ________________________________ From: Greater Plains Collaborative Software Development [[email protected]<mailto:[email protected]>] on behalf of Wilson Nathan [[email protected]<mailto:[email protected]>] Sent: Monday, February 03, 2014 8:26 AM To: [email protected]<mailto:[email protected]> Subject: Re: Minutes of GPV-DEV call 20140128 After reading that documentation I don’t think that it states precisely what you think it states? When I read the documentation; it states that there are required and optional attributes, and that there is not a limit to the number of optional attributes you use nor to the code sets and values used to populate these attribute. It doesn’t state that you can add columns as you desire, but rather that you can use the existing optional columns if you choose too. Nathan Wilson UW - Madison From: Greater Plains Collaborative Software Development [mailto:[email protected]] On Behalf Of Dan Connolly Sent: Sunday, February 02, 2014 11:53 PM To: [email protected]<mailto:[email protected]> Subject: Re: Minutes of GPV-DEV call 20140128 I think I said that adding columns to the dimension tables such as patient_dimension has been a documented i2b2 technique as far back as 1.3 or 1.4. Double-checking, I find: "The Patient table may have an unlimited number of optional columns and their data types and coding systems are local implementation-specific." -- 3.3 PATIENT_DIMENSION i2b2 Clinical Research Chart (CRC) Design Document Document Version: 1.1 I2b2 Software Release: 1.4 https://www.i2b2.org/software/projects/hivecore/i2b2core-doc-14.zip I don't believe I had anything to say about what the slides said about modifiers (i.e. that they appear in 1.6). -- Dan ________________________________ From: Greater Plains Collaborative Software Development [[email protected]<mailto:[email protected]>] on behalf of Campbell, James R [[email protected]<mailto:[email protected]>] Sent: Tuesday, January 28, 2014 2:43 PM To: [email protected]<mailto:[email protected]> Subject: Minutes of GPV-DEV call 20140128 GPC standard data model JRC: i2b2 documentation reports changes in functionality by version (now 1.7). Will minimum version be required for data standardization? DC: although documentation reports differently, modifiers available in v1.3 ________________________________ UT Southwestern Medical Center The future of medicine, today. [http://www.uc.wisc.edu/brand/templates-and-downloads/downloads/web/uwcrest_web_sm_fpo.png] Thomas Mish Informatics Systems Specialist SMPH-IT, Biomedical Informatics Services School of Medicine and Public Health, UW-Madison [email protected]<mailto:[email protected]> | Tel:(608)616-0362 Designed with WiseStamp - <http://s.wisestamp.com/links?url=http%3A%2F%2Fr1.wisestamp.com%2Fr%2Flanding%3Fu%3D04edcbfa0c4ca423%26v%3D3.13.31%26t%3D1391530007046%26promo%3D10%26dest%3Dhttp%253A%252F%252Fwww.wisestamp.com%252Femail-install%253Futm_source%253Dextension%2526utm_medium%253Demail%2526utm_campaign%253Dpromo_10> Get yours<http://s.wisestamp.com/links?url=http%3A%2F%2Fr1.wisestamp.com%2Fr%2Flanding%3Fu%3D04edcbfa0c4ca423%26v%3D3.13.31%26t%3D1391530007046%26promo%3D10%26dest%3Dhttp%253A%252F%252Fwww.wisestamp.com%252Femail-install%253Futm_source%253Dextension%2526utm_medium%253Demail%2526utm_campaign%253Dpromo_10>
