Re: Handling increasingly-intensive processes

Sam Raker Tue, 16 Dec 2014 04:35:21 -0800

Now that someone's said it, "just store tweets" seems like such a "duh" 
move.


Thanks!
-sam

On Monday, December 15, 2014 6:35:13 AM UTC-5, Thomas Heller wrote:
>
> Hey,
>
> without knowing much about your application/business needs its hard to 
> speculate what might be good for you. The root of your problem might be 
> CouchDB since it was never meant for "Big Data" and since we are talking 
> tweets I generally think "a lot". I'm not sure how your map value looks but 
> I think you do something like
>
> obj = (couch/get hash-tag)
> obj = (my-app/update obj new-tweet)
> (couch/put hash-tag obj)
>
> Which will always perform badly since you cannot do this concurrently, 
> except with CRDTs which CouchDB doesn't support since it does its own 
> MVCC.  Don't remember exaclty how their conflict resolution works but I 
> think it was "last write wins". Caching will not save you for long, since 
> writes will eventually become the bottleneck.
>
> Why do you not use a CouchDB view to create the hash-tag map on the server 
> and then just append-only the tweets? The views map function can then just 
> emit each tweet under the hash-tag key (once for each tag) and the reduce 
> function can build your map. That should perform alot better up to a 
> certain point and you can control how up-to-date your view index has to be.
>
> Anyways, might be best to choose another Database. Regardless of what 
> database you are using, updating a single place concurrently is going to be 
> a problem. An Atom in Clojure makes this look like a no-brainer but under 
> high load it can still blow up since it has no back-pressure in any way.
>
> "Bit Data" and "Distributed Systems" are hard and cannot be described in 
> short. Without exact knowledge of what your app/business needs look like it 
> is impossible to make the "correct" recommendation.
>
> HTH,
> /thomas
>
> On Monday, December 15, 2014 4:54:04 AM UTC+1, Sam Raker wrote:
>>
>> I'm (still) pulling tweets from twitter, processing them, and storing 
>> them in CouchDB with hashtags as doc ids, such that if a tweet contains 3 
>> hashtags, that tweet will be indexed under each of those 3 hashtags. My 
>> application hits CouchDB for the relevant document and uses Cheshire to 
>> convert the resulting string to a map. The map's values consist of a few 
>> string values and an array that consists of all the tweets that contain 
>> that hashtag. The problem is thus with common hashtags: the more tweets 
>> contain a given hashtag, the long that hashtag's "tweets" array will be, 
>> and, additionally, the more often that document will be retrieved from 
>> CouchDB. The likelihood and magnitude of performance hits on my app are 
>> therefore correlated, which is Bad.
>>
>> I'm reaching out to you all for suggestions about how best to deal with 
>> this situation. Some way of caching something, somehow? I'm at a loss, but 
>> I want to believe there's a solution.
>>
>>
>> Thanks,
>> -sam
>>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Handling increasingly-intensive processes

Reply via email to