flexible filtering needed, with speed.

Brad Anderson Mon, 18 Aug 2008 20:31:36 -0700

Howdy,

I have 12K docs that look like this:


{
 "_id": "000111bf7a8515da822b05ebbb8cd257",
 "_rev": "94750440",
 "month": 17,
 "store": {
  "store_num": 123,
  "city": "Atlanta",
  "state": "GA",
  "zip": "30301",
  "exterior": true,
  "interior": true,
  "restroom": true,
  "breakfast": true,
  "sunday": true,
  "adi_name": "Atlanta, GA",
  "adi_num": 123,
  "ownership": "Company",
  "playground": "Indoor",
  "seats": 123,
  "parking_spaces": 123
 },
 "raw": {
  "Other Hourly Pay": 0.28,
  "Workers Comp - State Funds Exp": 401.65,
  "Rent Expense - Company": -8,
  "Archives Expense": 82.81,
  "Revised Hours allowed per": 860.22,
  "Merch Standard": 174.78,
  "Total Property Tax": 1190.91

  ...

 }
}

I truncated 'raw' but it's usually much longer, and avg. doc size is 5K.

I'm trying to see how I will query them with views. I want to beable to filter down by various store sub fields, i.e all the Breakfast= true stores in Georgia that are owned by Franchisees. However, thiswill differ for just about every query.

The 'reduce' function would then be averaging each line in the 'raw'field.

I have played around with views that take the store filters, but justreturning the 'raw' field as the value from the map function isbrutally slow in Futon. This is because the view is accessed rightaway, so it builds, takes about 3-4 mins (on a MBP with 4GB RAM,2.2GHz dual core, 7200RPM disk). I understand the next time thisspecific store group is requested, it's fast... but they will all beso dynamic that this seems prohibitively slow.

So, I thought, should I be doing this in two steps? Set up the key tobe store and whatever else I might want to query on (Month or whatevertimeframe), and return the doc id's as the values on the originalquery? I would then send in a complex key to do the filtering. Thiswould require waiting for the _bulk_get functionality, and I'd sendthat list of ID's into a 2nd query to get the raw data to send it to'map'.

This is slow now on 12K docs... It needs to be stupid-fast at that lownumber of docs, because the plan is for *way* more data.

The filtering part is tailor-made for a RDBMS, but the doc handling(all the 'raw' fields will be different store-by-store, industry byindustry, change over time, and in general be free-form) is perfectfor CouchDB. Thoughts? I want to use the right tool for the job, andthat's looking like a RDBMS, sadly. That is, unless I'm completelymisusing Couch. In which case, swift blows to the head are welcome.


Cheers,
BA

flexible filtering needed, with speed.

Reply via email to