[
https://issues.apache.org/jira/browse/CASSANDRA-16961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Johnny Miller updated CASSANDRA-16961:
--------------------------------------
Description:
When compaction encounters a large partition, it outputs a warning in the logs
e.g.:
(Apologies, had to redact some information)
WARN [CompactionExecutor:343] 2021-09-16 09:28:43,539 BigTableWriter.java:211 -
Writing large partition XXX/XXXX:sourceid:{color:#de350b}*2021-09-16
05\:00Z*{color} (1.381GiB) to sstable
/mnt/var/lib/cassandra/data/segment/message-336c5ff04db211ebbffc2980407d44d6/md-58982-big-Data.db
i.e
[https://github.com/apache/cassandra/blob/cassandra-3.11.5/src/java/org/apache/cassandra/io/sstable/format/big/BigTableWriter.java#L211]
*Example Table/insert*
CREATE TABLE myks.mytable (
sourceid text,
{color:#de350b}*messagehour timestamp,*{color}
messagetime timestamp,
messageid text
PRIMARY KEY ((sourceid, messagehour), messagetime, messageid)
) ;
insert into myks.mytable (sourceid, messagehour, messagetime, messageid) values
('sourceid', '{color:#de350b}*2021-09-16 05:00Z'*{color}, '2021-09-16
05:00:31Z', '123ABC');
If I then need to try and work out which nodes in the cluster contain the
replica data for this partition (from the logs), I will get the token via CQL
eg:
select distinct token(sourceid,messagehour) from myks.mytable where
sourceid='sourceid' and messagehour='{color:#de350b}*2021-09-16 05:00Z*{color}';
system.token(sourceid, messagehour)
-------------------------------------
{color:#de350b}*7663675819538124697*{color}
I then run nodetool to get the endpoints for this token/ks/table
eg
nodetool getendpoints myks mytable {color:#de350b}*7663675819538124697*{color}
172.31.10.187
172.31.12.193
172.31.13.91
And *the list of endpoints is not correct* as the value outputted in the
timestamp warning log entry, I suspect, is missing additional
information/precision so obviously will give back the wrong token and hence the
wrong endpoints.
Possibly this warning log statement should output the actual partition key
token in addition to the other information to avoid confusion and the string
representation of the timestamp be correct.
was:
When compaction encounters a large partition, it outputs a warning in the logs
e.g.:
(Apologies, had to redact some information)
WARN [CompactionExecutor:343] 2021-09-16 09:28:43,539 BigTableWriter.java:211 -
Writing large partition XXX/XXXX:PROsVuVbHju33:{color:#de350b}*2021-09-16
05\:00Z*{color} (1.381GiB) to sstable
/mnt/var/lib/cassandra/data/segment/message-336c5ff04db211ebbffc2980407d44d6/md-58982-big-Data.db
i.e
[https://github.com/apache/cassandra/blob/cassandra-3.11.5/src/java/org/apache/cassandra/io/sstable/format/big/BigTableWriter.java#L211]
*Example Table/insert*
CREATE TABLE myks.mytable (
sourceid text,
{color:#de350b}*messagehour timestamp,*{color}
messagetime timestamp,
messageid text
PRIMARY KEY ((sourceid, messagehour), messagetime, messageid)
) ;
insert into myks.mytable (sourceid, messagehour, messagetime, messageid) values
('PROsVuVbHju33', '{color:#de350b}*2021-09-16 05:00Z'*{color}, '2021-09-16
05:00:31Z', '123ABC');
If I then need to try and work out which nodes in the cluster contain the
replica data for this partition (from the logs), I will get the token via CQL
eg:
select distinct token(sourceid,messagehour) from myks.mytable where
sourceid='PROsVuVbHju33' and messagehour='{color:#de350b}*2021-09-16
05:00Z*{color}';
system.token(sourceid, messagehour)
-------------------------------------
{color:#de350b}*7663675819538124697*{color}
I then run nodetool to get the endpoints for this token/ks/table
eg
nodetool getendpoints myks mytable {color:#de350b}*7663675819538124697*{color}
172.31.10.187
172.31.12.193
172.31.13.91
And *the list of endpoints is not correct* as the value outputted in the
timestamp warning log entry, I suspect, is missing additional
information/precision so obviously will give back the wrong token and hence the
wrong endpoints.
Possibly this warning log statement should output the actual partition key
token in addition to the other information to avoid confusion and the string
representation of the timestamp be correct.
> Timestamp String displayed for partition compaction warnings is not correct
> ---------------------------------------------------------------------------
>
> Key: CASSANDRA-16961
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16961
> Project: Cassandra
> Issue Type: Bug
> Reporter: Johnny Miller
> Priority: Normal
>
> When compaction encounters a large partition, it outputs a warning in the
> logs e.g.:
> (Apologies, had to redact some information)
> WARN [CompactionExecutor:343] 2021-09-16 09:28:43,539 BigTableWriter.java:211
> - Writing large partition XXX/XXXX:sourceid:{color:#de350b}*2021-09-16
> 05\:00Z*{color} (1.381GiB) to sstable
> /mnt/var/lib/cassandra/data/segment/message-336c5ff04db211ebbffc2980407d44d6/md-58982-big-Data.db
> i.e
> [https://github.com/apache/cassandra/blob/cassandra-3.11.5/src/java/org/apache/cassandra/io/sstable/format/big/BigTableWriter.java#L211]
> *Example Table/insert*
> CREATE TABLE myks.mytable (
> sourceid text,
> {color:#de350b}*messagehour timestamp,*{color}
> messagetime timestamp,
> messageid text
> PRIMARY KEY ((sourceid, messagehour), messagetime, messageid)
> ) ;
>
> insert into myks.mytable (sourceid, messagehour, messagetime, messageid)
> values ('sourceid', '{color:#de350b}*2021-09-16 05:00Z'*{color}, '2021-09-16
> 05:00:31Z', '123ABC');
> If I then need to try and work out which nodes in the cluster contain the
> replica data for this partition (from the logs), I will get the token via CQL
> eg:
> select distinct token(sourceid,messagehour) from myks.mytable where
> sourceid='sourceid' and messagehour='{color:#de350b}*2021-09-16
> 05:00Z*{color}';
> system.token(sourceid, messagehour)
> -------------------------------------
> {color:#de350b}*7663675819538124697*{color}
> I then run nodetool to get the endpoints for this token/ks/table
> eg
> nodetool getendpoints myks mytable
> {color:#de350b}*7663675819538124697*{color}
> 172.31.10.187
> 172.31.12.193
> 172.31.13.91
> And *the list of endpoints is not correct* as the value outputted in the
> timestamp warning log entry, I suspect, is missing additional
> information/precision so obviously will give back the wrong token and hence
> the wrong endpoints.
> Possibly this warning log statement should output the actual partition key
> token in addition to the other information to avoid confusion and the string
> representation of the timestamp be correct.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]