[ 
https://issues.apache.org/jira/browse/CASSANDRA-6572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14073141#comment-14073141
 ] 

Lyuben Todorov commented on CASSANDRA-6572:
-------------------------------------------

bq. It looks to me like you need some way to share the statement preparation 
across threads, as it can be used by any thread (and across log segments) once 
prepared. Probably easiest to do it during parsing of the log file

Seems simple enough, creating a concurrent map that is shared across a 
WorkloadReplayer should do the job. The problem posed with doing it whilst 
parsing the log is that the statement might be for a ks / cf that isn't yet 
created 

bq. We also have an issue with replay potentially over-parallelizing, and also 
potentially OOMing, as you're submitting straight to a thread pool after 
parsing each file. So there's nothing stopping us racing ahead and reading all 
of the log files (you have an unbounded queue)

Possible solution is to move the multimap at the class level rather than having 
{{WP#read}} creating one each time it's called (again per WorkloadReplayer 
which is fine since we should only have 1 per replay). Then every time a read 
is completed we submit the collection of {{QuerylogSegments}} to be replayed, 
empty the map and populate it again if the same thread-id is met in 
{{WP#read}}. The tricky part is submitting the same thread-id only once we know 
the executor doesn't have a task with the same thread-id already running.

bq. Also, we're still replaying based on offset from last query, which means we 
will skew very quickly. We should be fixing an epoch (in nanos) such that you 
have a log epoch of L, and queries are run at T=L+X; when re-run we have a 
replay epoch of R, and we run queries at R+X
 
It's on the todo list.

> Workload recording / playback
> -----------------------------
>
>                 Key: CASSANDRA-6572
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6572
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core, Tools
>            Reporter: Jonathan Ellis
>            Assignee: Lyuben Todorov
>             Fix For: 2.1.1
>
>         Attachments: 6572-trunk.diff
>
>
> "Write sample mode" gets us part way to testing new versions against a real 
> world workload, but we need an easy way to test the query side as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to