Now that someone's said it, "just store tweets" seems like such a "duh" move.
Thanks! -sam On Monday, December 15, 2014 6:35:13 AM UTC-5, Thomas Heller wrote: > > Hey, > > without knowing much about your application/business needs its hard to > speculate what might be good for you. The root of your problem might be > CouchDB since it was never meant for "Big Data" and since we are talking > tweets I generally think "a lot". I'm not sure how your map value looks but > I think you do something like > > obj = (couch/get hash-tag) > obj = (my-app/update obj new-tweet) > (couch/put hash-tag obj) > > Which will always perform badly since you cannot do this concurrently, > except with CRDTs which CouchDB doesn't support since it does its own > MVCC. Don't remember exaclty how their conflict resolution works but I > think it was "last write wins". Caching will not save you for long, since > writes will eventually become the bottleneck. > > Why do you not use a CouchDB view to create the hash-tag map on the server > and then just append-only the tweets? The views map function can then just > emit each tweet under the hash-tag key (once for each tag) and the reduce > function can build your map. That should perform alot better up to a > certain point and you can control how up-to-date your view index has to be. > > Anyways, might be best to choose another Database. Regardless of what > database you are using, updating a single place concurrently is going to be > a problem. An Atom in Clojure makes this look like a no-brainer but under > high load it can still blow up since it has no back-pressure in any way. > > "Bit Data" and "Distributed Systems" are hard and cannot be described in > short. Without exact knowledge of what your app/business needs look like it > is impossible to make the "correct" recommendation. > > HTH, > /thomas > > On Monday, December 15, 2014 4:54:04 AM UTC+1, Sam Raker wrote: >> >> I'm (still) pulling tweets from twitter, processing them, and storing >> them in CouchDB with hashtags as doc ids, such that if a tweet contains 3 >> hashtags, that tweet will be indexed under each of those 3 hashtags. My >> application hits CouchDB for the relevant document and uses Cheshire to >> convert the resulting string to a map. The map's values consist of a few >> string values and an array that consists of all the tweets that contain >> that hashtag. The problem is thus with common hashtags: the more tweets >> contain a given hashtag, the long that hashtag's "tweets" array will be, >> and, additionally, the more often that document will be retrieved from >> CouchDB. The likelihood and magnitude of performance hits on my app are >> therefore correlated, which is Bad. >> >> I'm reaching out to you all for suggestions about how best to deal with >> this situation. Some way of caching something, somehow? I'm at a loss, but >> I want to believe there's a solution. >> >> >> Thanks, >> -sam >> > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.