The really important thing to really take away from Ryan's original post is
that batches are not there for performance.  The only case I consider
batches to be useful for is when you absolutely need to know that several
tables all get a mutation (via logged batches).  The use case for this is
when you've got multiple tables that are serving as different views for
data.  It is absolutely not going to help you if you're trying to lump
queries together to reduce network & server overhead - in fact it'll do the
opposite.  If you're trying to do that, instead perform many async
queries.  The overhead of batches in cassandra is significant and you're
going to hit a lot of problems if you use them excessively (timeouts /
failures).

tl;dr: you probably don't want batch, you most likely want many async calls

On Thu Dec 11 2014 at 11:15:00 PM Mohammed Guller <moham...@glassbeam.com>
wrote:

>  Ryan,
>
> Thanks for the quick response.
>
>
>
> I did see that jira before posting my question on this list. However, I
> didn’t see any information about why 5kb+ data will cause instability. 5kb
> or even 50kb seems too small. For example, if each mutation is 1000+ bytes,
> then with just 5 mutations, you will hit that threshold.
>
>
>
> In addition, Patrick is saying that he does not recommend more than 100
> mutations per batch. So why not warn users just on the # of mutations in a
> batch?
>
>
>
> Mohammed
>
>
>
> *From:* Ryan Svihla [mailto:rsvi...@datastax.com]
> *Sent:* Thursday, December 11, 2014 12:56 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: batch_size_warn_threshold_in_kb
>
>
>
> Nothing magic, just put in there based on experience. You can find the
> story behind the original recommendation here
>
>
>
> https://issues.apache.org/jira/browse/CASSANDRA-6487
>
>
>
> Key reasoning for the desire comes from Patrick McFadden:
>
>
> "Yes that was in bytes. Just in my own experience, I don't recommend more
> than ~100 mutations per batch. Doing some quick math I came up with 5k as
> 100 x 50 byte mutations.
>
> Totally up for debate."
>
>
>
> It's totally changeable, however, it's there in no small part because so
> many people confuse the BATCH keyword as a performance optimization, this
> helps flag those cases of misuse.
>
>
>
> On Thu, Dec 11, 2014 at 2:43 PM, Mohammed Guller <moham...@glassbeam.com>
> wrote:
>
> Hi –
>
> The cassandra.yaml file has property called *batch_size_warn_threshold_in_kb.
> *
>
> The default size is 5kb and according to the comments in the yaml file, it
> is used to log WARN on any batch size exceeding this value in kilobytes. It
> says caution should be taken on increasing the size of this threshold as it
> can lead to node instability.
>
>
>
> Does anybody know the significance of this magic number 5kb? Why would a
> higher number (say 10kb) lead to node instability?
>
>
>
> Mohammed
>
>
>
>
> --
>
> [image: datastax_logo.png] <http://www.datastax.com/>
>
> Ryan Svihla
>
> Solution Architect
>
>
> [image: twitter.png] <https://twitter.com/foundev>[image: linkedin.png]
> <http://www.linkedin.com/pub/ryan-svihla/12/621/727/>
>
>
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
>
>

Reply via email to