[akka-user] Re: [akka-persistance cassandra plugin] eventsByTag is not distributed good enough across cassandra cluster

Christopher Batey Wed, 01 Nov 2017 12:30:10 -0700

Hi Serhi

Yes you're right the current partition key only works if you have a 
smallish number (10ks) of events per tag per day.


The main messages table works like #2 for persistence ids but it all 
happens internally via the partition_nr column rather than the user having 
to do it.

We're already planning on moving away from using materialised views due to 
the instability of the feature as reported on the cassandra dev mailing 
list (see https://github.com/akka/akka-persistence-cassandra/issues/247).

For the new solution (manually managing a separate table partitioned by tag 
and time window) I had planned to do what you suggest in #3, making it 
configurable to day, hour or minute for the partitioning. The downside of 
making it minute for everyone is
when running an eventByTag query if you have a low number of events it 
would need to query many, possibly empty, partitions. Not using a 
materialised view will also allow is to batch writes to the tag table 
meaning we should be able to support any number of tags.

I should have something for you to try out next week and it would be great 
to get your feedback.

On Wednesday, 1 November 2017 17:17:43 UTC, Serhii Nesteruk wrote:
>
> Hello
>
> eventsByTag in the CassandraReadJournal uses a materialized view to read 
> the events. Currently materialized view is created by 
>
> CREATE MATERIALIZED VIEW IF NOT EXISTS $eventsByTagViewName$tagId AS
>    SELECT tag$tagId, timebucket, timestamp, persistence_id, partition_nr, 
> sequence_nr, writer_uuid, ser_id, ser_manifest, event_manifest, event, message
>    FROM $tableName
>    WHERE persistence_id IS NOT NULL AND partition_nr IS NOT NULL AND 
> sequence_nr IS NOT NULL
>      AND tag$tagId IS NOT NULL AND timestamp IS NOT NULL AND timebucket IS 
> NOT NULL
>    PRIMARY KEY ((tag$tagId, timebucket), timestamp, persistence_id, 
> partition_nr, sequence_nr)
>    WITH CLUSTERING ORDER BY (timestamp ASC)
> """
>
>
> Partition key is (tag$tagId, timebucket) where timebucket has the 
> following format: DateTimeFormatter.ofPattern("yyyyMMdd")
> I've got a huge amount of events with the same tag. As a result all events 
> are stored on single cassandra node for one day, since all nodes 
> participated in writing events, this "materialized view" node slows down 
> the whole system.
>
> Possible workarounds:
> 1.Use set of tagId instead of one, to calculate tagId simple hash function 
> can be used: hash(event) % m. But in this case it may slow down query for 
> read part, as it should find every event across all nodes in the cluster. 
> 2. Also I thought about counter-based solution, change tag Id every 10k 
> events
> 3. contribute to cassandra plugin to make timebucket configurable and add 
> minutes to the pattern.
>
> Is it make sense? I've got some doubts :) . Because, anyway, one of the 
> nodes will "suffer" from materialized view and slow down the whole system.
>
> I'll be glad to hear any thoughts about it
>
> Thanks,
>   Serhii
>

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

[akka-user] Re: [akka-persistance cassandra plugin] eventsByTag is not distributed good enough across cassandra cluster

Reply via email to