Based on input from James Topping of Duke, I have come to learn the following:
There are two different mechanisms of data query. 1. Query against a set of SAS tables (a SAS dataset), or a locked, static RDBMS (which I understand to mean a database whose contents are temporarily frozen . For those folks who are doing a regular CDM v3 build as part of their EHR ETL process, this would imply making a single point-in-time backup copy of the CDM tables after the diagnostic query is run - the idea being that if the Duke folks detect problems, they can point specifically to the data that caused it, with some guarantee that the problem data hasn't changed under them). 2. "Menu-driven query" through PopMedNet to access the RDBMS directly. No locking of the data is necessary. However, option 2 is not yet rolled out. In either case, it is important to note that one of the major changes between CDM v2 and v3 is that in v2, instead of using the intrinsic date/time RDBMS data types (which SAS can understand), the tables used VARCHAR(N) fields to store text as YYYY-MM-DD or HH:MM:SS. Nathan Graham noted that queries that involved date fields ran very slowly because the data had to be converted into dates by the DBMS optimizer dynamically, a row at a time. This design mistake was rectified in version 3 where the document now specifies that intrinsic RDBMS date types should be used. (Our ETL generated intrinsic date/time types, and some "hello world" SAS queries against these fields worked fine.) I've been studying the SAS documentation for connectivity to RDBMSs, and they specify that if you use ODBC, there are two ways of accessing RDBMS data. 1. The preferred approach (in my opinion) is using "SQL pass-through", where the SQL is sent directly to the RDBMS without interception/interpretation by SAS. This allows the DBMS optimizer to do its thing (e.g., use server-based indexes) , so that queries run at server speeds. 2. The alternative method (using the SAS LIBNAME command) is more flexible, in that SAS's dialect of SQL is employed instead (so that certain intrinsic SAS functions without an RDBMS counterpart may be leveraged - e.g., for specialized statistical analyses), but all processing is done in SAS (i.e., on the client), so that queries involving multi-table joins, or accessing large amounts of raw data that return concise summaries (e.g, GROUP BY queries) run very slowly because of network-bandwidth becomes a rate limiting factor. Prakash ________________________________ Notice: This UI Health Care e-mail (including attachments) is covered by the Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is confidential and may be legally privileged. If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution, or copying of this communication is strictly prohibited. Please reply to the sender that you have received the message in error, then delete it. Thank you. ________________________________
_______________________________________________ Gpc-dev mailing list [email protected] http://listserv.kumc.edu/mailman/listinfo/gpc-dev
