Howdy,

I have 12K docs that look like this:

{
 "_id": "000111bf7a8515da822b05ebbb8cd257",
 "_rev": "94750440",
 "month": 17,
 "store": {
  "store_num": 123,
  "city": "Atlanta",
  "state": "GA",
  "zip": "30301",
  "exterior": true,
  "interior": true,
  "restroom": true,
  "breakfast": true,
  "sunday": true,
  "adi_name": "Atlanta, GA",
  "adi_num": 123,
  "ownership": "Company",
  "playground": "Indoor",
  "seats": 123,
  "parking_spaces": 123
 },
 "raw": {
  "Other Hourly Pay": 0.28,
  "Workers Comp - State Funds Exp": 401.65,
  "Rent Expense - Company": -8,
  "Archives Expense": 82.81,
  "Revised Hours allowed per": 860.22,
  "Merch Standard": 174.78,
  "Total Property Tax": 1190.91

  ...

 }
}

I truncated 'raw' but it's usually much longer, and avg. doc size is 5K.

I'm trying to see how I will query them with views. I want to be able to filter down by various store sub fields, i.e all the Breakfast = true stores in Georgia that are owned by Franchisees. However, this will differ for just about every query.

The 'reduce' function would then be averaging each line in the 'raw' field.

I have played around with views that take the store filters, but just returning the 'raw' field as the value from the map function is brutally slow in Futon. This is because the view is accessed right away, so it builds, takes about 3-4 mins (on a MBP with 4GB RAM, 2.2GHz dual core, 7200RPM disk). I understand the next time this specific store group is requested, it's fast... but they will all be so dynamic that this seems prohibitively slow.

So, I thought, should I be doing this in two steps? Set up the key to be store and whatever else I might want to query on (Month or whatever timeframe), and return the doc id's as the values on the original query? I would then send in a complex key to do the filtering. This would require waiting for the _bulk_get functionality, and I'd send that list of ID's into a 2nd query to get the raw data to send it to 'map'.

This is slow now on 12K docs... It needs to be stupid-fast at that low number of docs, because the plan is for *way* more data.

The filtering part is tailor-made for a RDBMS, but the doc handling (all the 'raw' fields will be different store-by-store, industry by industry, change over time, and in general be free-form) is perfect for CouchDB. Thoughts? I want to use the right tool for the job, and that's looking like a RDBMS, sadly. That is, unless I'm completely misusing Couch. In which case, swift blows to the head are welcome.

Cheers,
BA


Reply via email to