I have been circling around a thought process over batches. Now that Cassandra has aggregating functions, it might be possible write a type of record that has an END_OF_BATCH type marker and the data can be suppressed from view until it was all there.
IE you write something like a checksum record that an intelligent client can use to tell if the rest of the batch is complete. On Wed, Dec 7, 2016 at 11:58 AM, Voytek Jarnot <voytek.jar...@gmail.com> wrote: > Been about a month since I have up on it, but it was very much related to > the stuff you're dealing with ... Basically Cassandra just stepping on its > own.... errrrr, tripping over its own feet streaming MVs. > > On Dec 7, 2016 10:45 AM, "Benjamin Roth" <benjamin.r...@jaumo.com> wrote: > >> I meant the mv thing >> >> Am 07.12.2016 17:27 schrieb "Voytek Jarnot" <voytek.jar...@gmail.com>: >> >>> Sure, about which part? >>> >>> default batch size warning is 5kb >>> I've increased it to 30kb, and will need to increase to 40kb (8x default >>> setting) to avoid WARN log messages about batch sizes. I do realize it's >>> just a WARNing, but may as well avoid those if I can configure it out. >>> That said, having to increase it so substantially (and we're only dealing >>> with 5 tables) is making me wonder if I'm not taking the correct approach >>> in terms of using batches to guarantee atomicity. >>> >>> On Wed, Dec 7, 2016 at 10:13 AM, Benjamin Roth <benjamin.r...@jaumo.com> >>> wrote: >>> >>>> Could you please be more specific? >>>> >>>> Am 07.12.2016 17:10 schrieb "Voytek Jarnot" <voytek.jar...@gmail.com>: >>>> >>>>> Should've mentioned - running 3.9. Also - please do not recommend >>>>> MVs: I tried, they're broken, we punted. >>>>> >>>>> On Wed, Dec 7, 2016 at 10:06 AM, Voytek Jarnot < >>>>> voytek.jar...@gmail.com> wrote: >>>>> >>>>>> The low default value for batch_size_warn_threshold_in_kb is making >>>>>> me wonder if I'm perhaps approaching the problem of atomicity in a >>>>>> non-ideal fashion. >>>>>> >>>>>> With one data set duplicated/denormalized into 5 tables to support >>>>>> queries, we use batches to ensure inserts make it to all or 0 tables. >>>>>> This >>>>>> works fine, but I've had to bump the warn threshold and fail threshold >>>>>> substantially (8x higher for the warn threshold). This - in turn - makes >>>>>> me wonder, with a default setting so low, if I'm not solving this problem >>>>>> in the canonical/standard way. >>>>>> >>>>>> Mostly just looking for confirmation that we're not unintentionally >>>>>> doing something weird... >>>>>> >>>>> >>>>> >>>