[jira] [Commented] (CASSANDRA-11656) sstabledump has inconsistency in deletion_time printout

Wei Deng (JIRA) Tue, 26 Apr 2016 15:15:52 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259049#comment-15259049
 ]


Wei Deng commented on CASSANDRA-11656:
--------------------------------------

I tested out the patch [~cnlwsu] provided in CASSANDRA-11655. However, I still 
see some discrepancies like the following:

{noformat}
~/cassandra-trunk/tools/bin/sstabledump ma-15-big-Data.db
[
  {
    "partition" : {
      "key" : [ "1" ],
      "position" : 0
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 18,
        "clustering" : [ "c1" ],
        "liveness_info" : { "tstamp" : 1461646542601774 },
        "cells" : [
          { "name" : "val0_int", "deletion_info" : { "tstamp" : 1461649343 },
            "tstamp" : 1461649343000508
          },
          { "name" : "val1_set_of_int", "deletion_info" : { "deletion_time" : 
1461647295880443, "tstamp" : 1461647295 } },
          { "name" : "val1_set_of_int", "path" : [ "1" ], "deletion_info" : { 
"tstamp" : 1461647320 },
            "tstamp" : 1461647320160261
          },
          { "name" : "val1_set_of_int", "path" : [ "10" ], "value" : "", 
"tstamp" : 1461647295880444 },
          { "name" : "val1_set_of_int", "path" : [ "11" ], "value" : "", 
"tstamp" : 1461647295880444 },
          { "name" : "val1_set_of_int", "path" : [ "12" ], "value" : "", 
"tstamp" : 1461647295880444 }
        ]
      },
      {
        "type" : "row",
        "position" : 86,
        "clustering" : [ "c2" ],
        "deletion_info" : { "deletion_time" : 1461647588089843, "tstamp" : 
1461647588 },
        "cells" : [ ]
      },
      {
        "type" : "row",
        "position" : 101,
        "clustering" : [ "c4" ],
        "liveness_info" : { "tstamp" : 1461649635932899 },
        "cells" : [ ]
      },
      {
        "type" : "row",
        "position" : 114,
        "clustering" : [ "c5" ],
        "liveness_info" : { "tstamp" : 1461650266651050, "ttl" : 60, 
"expires_at" : 1461650326, "expired" : true },
        "cells" : [
          { "name" : "val0_int", "value" : "500", "tstamp" : 1461650241403672 },
          { "name" : "val1_set_of_int", "deletion_info" : { "deletion_time" : 
1461650241403671, "tstamp" : 1461650241 } },
          { "name" : "val1_set_of_int", "path" : [ "111" ], "value" : "", 
"tstamp" : 1461650241403672 },
          { "name" : "val1_set_of_int", "path" : [ "222" ], "value" : "", 
"tstamp" : 1461650241403672 },
          { "name" : "val1_set_of_int", "path" : [ "333" ], "value" : "", 
"tstamp" : 1461650241403672 }
        ]
      },
      {
        "type" : "row",
        "position" : 180,
        "clustering" : [ "c6" ],
        "deletion_info" : { "deletion_time" : 1461708091029189, "tstamp" : 
1461708091 },
        "cells" : [ ]
      }
    ]
  }
]
{noformat}

IMHO if we decide to use tstamp to represent timestamp of the writes (whether 
it's a delete or a regular mutation), then it should always be microseconds 
since epoch (16 digits), and it should be consistent across regular cells and 
tombstones.

In my view, the "deletion_time" can be a good short name for localDeletionTime 
(which only guides compaction to do GC) and as long as we are consistent across 
the board and always use that to represent localDeletionTime that has only 10 
digits (seconds since epoch), it's good to me too.

> sstabledump has inconsistency in deletion_time printout
> -------------------------------------------------------
>
>                 Key: CASSANDRA-11656
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11656
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tools
>            Reporter: Wei Deng
>              Labels: Tools
>
> See the following output (note the deletion info under the second row):
> {noformat}
> [
>   {
>     "partition" : {
>       "key" : [ "1" ],
>       "position" : 0
>     },
>     "rows" : [
>       {
>         "type" : "row",
>         "position" : 18,
>         "clustering" : [ "c1" ],
>         "liveness_info" : { "tstamp" : 1461646542601774 },
>         "cells" : [
>           { "name" : "val0_int", "deletion_time" : 1461647421, "tstamp" : 
> 1461647421344759 },
>           { "name" : "val1_set_of_int", "path" : [ "1" ], "deletion_time" : 
> 1461647320, "tstamp" : 1461647320160261 },
>           { "name" : "val1_set_of_int", "path" : [ "10" ], "value" : "", 
> "tstamp" : 1461647295880444 },
>           { "name" : "val1_set_of_int", "path" : [ "11" ], "value" : "", 
> "tstamp" : 1461647295880444 },
>           { "name" : "val1_set_of_int", "path" : [ "12" ], "value" : "", 
> "tstamp" : 1461647295880444 }
>         ]
>       },
>       {
>         "type" : "row",
>         "position" : 85,
>         "clustering" : [ "c2" ],
>         "deletion_info" : { "deletion_time" : 1461647588089843, "tstamp" : 
> 1461647588 },
>         "cells" : [ ]
>       }
>     ]
>   }
> ]
> {noformat}
> To avoid confusion, we need to have consistency in printing out the 
> DeletionTime object. By definition, markedForDeleteAt is in microseconds 
> since epoch and marks the time when the "delete" mutation happens; 
> localDeletionTime is in seconds since epoch and allows GC to collect the 
> tombstone if the current epoch second is greater than localDeletionTime + 
> gc_grace_seconds. I'm ok to use "tstamp" to represent markedForDeleteAt 
> because markedForDeleteAt does represent this delete mutation's timestamp, 
> but we need to be consistent everywhere.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11656) sstabledump has inconsistency in deletion_time printout

Reply via email to