Re: [CODE4LIB] Processing Circ data

Harper, Cynthia Thu, 06 Aug 2015 06:07:08 -0700

I have compacted the database, and I'm using the Group By SQL query. I think I 
actually am hitting the 2GB limit, because of all the data I have for each row. 
I'm wondering if having added a field for reserves history notes, that that's 
treated as a fixed-length field for every record, rather than variable length, 
and just appearing for the small number of records that have been put on 
reserve.  I suppose if I exported my data in two tables - bib and item data, 
the database would be much more efficient than the flat-file approach I've been 
using.  Time to turn the mind back on, rather than just taking the lazy 
approach every time...


Cindy

-----Original Message-----
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Kevin 
Ford
Sent: Wednesday, August 05, 2015 5:16 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Processing Circ data

On the surface, your difficulties suggest you may need look at a few 
optimization tactics. Apologies if these are things you've already considered 
and addressed - just offering a suggestion.

This page [1] is for Access 2003 but the items under "Improve query 
performance" should apply - I think - to newer versions also.  I'll draw 
specific attention to 1) Compacting the database; 2) making sure you have an 
index set up on the bib record number field and number of circs field; and 3) 
make sure you are using hte "Group by" sql syntax [2].

Now, I'm not terribly familiar with Access so I can't actually help you with 
point/click instructions, but the above are common 'gotchas' that could be a 
problem regardless of RDBMS.

Yours,
Kevin

[1] https://support.microsoft.com/en-us/kb/209126
[2] http://www.w3schools.com/sql/sql_groupby.asp



On 8/5/15 4:01 PM, Harper, Cynthia wrote:
> Well, I guess it could be bad data, but I don't know how to tell. I think 
> I've done more than this before.
>
> I have a "Find duplicates" query that groups by bib record number.  That 
> query seemed to take about 40 minutes to process. Then I added a criterion to 
> limit to only records that had >0 circs this year. That query displays the 
> rotating cursor, then says "Not Responding", then the cursor, and loops 
> through that for hours.  Maybe I can find the Access bad data, but I'd be 
> glad to find a more modern data analysis software.  My db is 136,256 kb.  But 
> adding that extra query will probably put it over the 2GB mark.  I've tried 
> extracting to a csv, and that didn't work. Maybe I'll try a Make table to a 
> separate db.
>
> Or the OpenRefine suggestion sounds good too.
>
> Cindy Harper
>
> -----Original Message-----
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf 
> Of Kevin Ford
> Sent: Wednesday, August 05, 2015 4:23 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Processing Circ data
>
> Hi Cindy,
>
> This doesn't quite address your issue, but, unless you've hit the 2 GB Access 
> size limit [1], Access can handle a good deal more than 250,000 item records 
> ("rows," yes?) you cited.
>
> What makes you think you've hit the limit?  Slowness, something else?
>
> All the best,
> Kevin
>
> [1]
> https://support.office.com/en-us/article/Access-2010-specifications-1e
> 521481-7f9a-46f7-8ed9-ea9dff1fa854
>
>
>
>
>
> On 8/5/15 3:07 PM, Harper, Cynthia wrote:
>> Hi all. What are you using to process circ data for ad-hoc queries.  I 
>> usually extract csv or tab-delimited files - one row per item record, with 
>> identifying bib record data, then total checkouts over the given time 
>> period(s).  I have been importing these into Access then grouping them by 
>> bib record. I think that I've reached the limits of scalability for Access 
>> for this project now, with 250,000 item records.  Does anyone do this in R?  
>> My other go-to- software for data processing is RapidMiner free version.  Or 
>> do you just use MySQL or other SQL database?  I was looking into doing it in 
>> R with RSQLite (just read about this and sqldf  
>> http://www.r-bloggers.com/make-r-speak-sql-with-sqldf/ ) because ...  I'm 
>> rusty enough in R that if anyone will give me some start-off data import 
>> code, that would be great.
>>
>> Cindy Harper
>> E-services and periodicals librarian
>> Virginia Theological Seminary
>> Bishop Payne Library
>> 3737 Seminary Road
>> Alexandria VA 22304
>> char...@vts.edu<mailto:char...@vts.edu>
>> 703-461-1794
>>

Re: [CODE4LIB] Processing Circ data

Reply via email to