Howdy,
I have 12K docs that look like this:
{
"_id": "000111bf7a8515da822b05ebbb8cd257",
"_rev": "94750440",
"month": 17,
"store": {
"store_num": 123,
"city": "Atlanta",
"state": "GA",
"zip": "30301",
"exterior": true,
"interior": true,
"restroom": true,
"breakfast": true,
"sunday": true,
"adi_name": "Atlanta, GA",
"adi_num": 123,
"ownership": "Company",
"playground": "Indoor",
"seats": 123,
"parking_spaces": 123
},
"raw": {
"Other Hourly Pay": 0.28,
"Workers Comp - State Funds Exp": 401.65,
"Rent Expense - Company": -8,
"Archives Expense": 82.81,
"Revised Hours allowed per": 860.22,
"Merch Standard": 174.78,
"Total Property Tax": 1190.91
...
}
}
I truncated 'raw' but it's usually much longer, and avg. doc size is 5K.
I'm trying to see how I will query them with views. I want to be
able to filter down by various store sub fields, i.e all the Breakfast
= true stores in Georgia that are owned by Franchisees. However, this
will differ for just about every query.
The 'reduce' function would then be averaging each line in the 'raw'
field.
I have played around with views that take the store filters, but just
returning the 'raw' field as the value from the map function is
brutally slow in Futon. This is because the view is accessed right
away, so it builds, takes about 3-4 mins (on a MBP with 4GB RAM,
2.2GHz dual core, 7200RPM disk). I understand the next time this
specific store group is requested, it's fast... but they will all be
so dynamic that this seems prohibitively slow.
So, I thought, should I be doing this in two steps? Set up the key to
be store and whatever else I might want to query on (Month or whatever
timeframe), and return the doc id's as the values on the original
query? I would then send in a complex key to do the filtering. This
would require waiting for the _bulk_get functionality, and I'd send
that list of ID's into a 2nd query to get the raw data to send it to
'map'.
This is slow now on 12K docs... It needs to be stupid-fast at that low
number of docs, because the plan is for *way* more data.
The filtering part is tailor-made for a RDBMS, but the doc handling
(all the 'raw' fields will be different store-by-store, industry by
industry, change over time, and in general be free-form) is perfect
for CouchDB. Thoughts? I want to use the right tool for the job, and
that's looking like a RDBMS, sadly. That is, unless I'm completely
misusing Couch. In which case, swift blows to the head are welcome.
Cheers,
BA