Re: [EXTERNAL] Two separate rows for the same partition !!

2019-05-16 Thread Ahmed Eljami
The issue is fixed with nodetool scrub, now both rows are under the same
clustering.

I'll open a jira to analyze the source of this issue with Cassandra 3.11.3

Thanks.

Le jeu. 16 mai 2019 à 04:53, Jeff Jirsa  a écrit :

> I don’t have a good answer for you - I don’t know if scrub will fix this
> (you could copy an sstable offline and try it locally in ccm) - you may
> need to delete and reinsert, though I’m really interested in knowing how
> this happened if you weren’t ever exposed to #14008.
>
> Can you open a JIRA? If your sstables aren’t especially sensitive,
> uploading them would be swell. Otherwise , an anonymized JSON dump may be
> good enough for whichever developer looks at fixing this
>
> --
> Jeff Jirsa
>
>
> On May 15, 2019, at 7:27 PM, Ahmed Eljami  wrote:
>
> Jeff, In this case is there any solution to resolve that directly in the
> sstable (compact, scrub...) or we have to apply a batch on the client level
> (delete a partition and re write it)?
>
> Thank you for your reply.
>
> Le mer. 15 mai 2019 à 18:09, Ahmed Eljami  a
> écrit :
>
>> effectively, this was written in 2.1.14 and we upgrade to 3.11.3 so we
>> should not be impacted by this issue ?!
>> thanks
>>
>>


Re: [EXTERNAL] Two separate rows for the same partition !!

2019-05-15 Thread Jeff Jirsa
I don’t have a good answer for you - I don’t know if scrub will fix this (you 
could copy an sstable offline and try it locally in ccm) - you may need to 
delete and reinsert, though I’m really interested in knowing how this happened 
if you weren’t ever exposed to #14008. 

Can you open a JIRA? If your sstables aren’t especially sensitive, uploading 
them would be swell. Otherwise , an anonymized JSON dump may be good enough for 
whichever developer looks at fixing this 

-- 
Jeff Jirsa


> On May 15, 2019, at 7:27 PM, Ahmed Eljami  wrote:
> 
> Jeff, In this case is there any solution to resolve that directly in the 
> sstable (compact, scrub...) or we have to apply a batch on the client level 
> (delete a partition and re write it)?
> 
> Thank you for your reply. 
> 
>> Le mer. 15 mai 2019 à 18:09, Ahmed Eljami  a écrit :
>> effectively, this was written in 2.1.14 and we upgrade to 3.11.3 so we 
>> should not be impacted by this issue ?!
>> thanks
>> 


Re: [EXTERNAL] Two separate rows for the same partition !!

2019-05-15 Thread Ahmed Eljami
Jeff, In this case is there any solution to resolve that directly in the
sstable (compact, scrub...) or we have to apply a batch on the client level
(delete a partition and re write it)?

Thank you for your reply.

Le mer. 15 mai 2019 à 18:09, Ahmed Eljami  a écrit :

> effectively, this was written in 2.1.14 and we upgrade to 3.11.3 so we
> should not be impacted by this issue ?!
> thanks
>
>


Re: [EXTERNAL] Two separate rows for the same partition !!

2019-05-15 Thread Ahmed Eljami
 effectively, this was written in 2.1.14 and we upgrade to 3.11.3 so we
should not be impacted by this issue ?!
thanks


Re: [EXTERNAL] Two separate rows for the same partition !!

2019-05-15 Thread Jeff Jirsa
https://issues.apache.org/jira/browse/CASSANDRA-14008

If this was written in 2.1/2.2 and you upgraded to 3.0.x (x < 16) or 
3.1-3.11.1, could be this issue. 

-- 
Jeff Jirsa


> On May 15, 2019, at 8:43 AM, Ahmed Eljami  wrote:
> 
> What about this part of the dump:
> 
> "type" : "row",
> "position" : 4123,
> "clustering" : [ "", "Token", "abcd", "" ],
> "cells" : [
>   { "name" : "dvalue", "value" : "", "tstamp" : 
> "2019-04-26T17:20:39.910Z", "ttl" : 31708792, "expires_at" : 
> "2020-04-27T17:20:31Z", "expired" : false } 
> 
> Why we don't have a liveness_info for this row ?
> 
> Thanks
> 
>> Le mer. 15 mai 2019 à 17:40, Ahmed Eljami  a écrit :
>> Hi Sean,
>> Thanks for reply,
>> I'm agree with you about uniquness but when  the output of sstabledump show 
>> that we have the same value for the column g => "clustering" : [ "", 
>> "Token", "abcd", "" ], 
>> and when we select with the whole primary key with the valuers wich I see in 
>> the sstable, cqlsh return 2 rows..
>> 
>>> Le mer. 15 mai 2019 à 17:27, Durity, Sean R  a 
>>> écrit :
>>> Uniqueness is determined by the partition key PLUS the clustering columns. 
>>> Hard to tell from your data below, but is it possible that one of the 
>>> clustering columns (perhaps g) has different values? That would easily 
>>> explain the 2 rows returned – because they ARE different rows in the same 
>>> partition. In your data model, make sure you need all the clustering 
>>> columns to determine uniqueness or you will indeed have more rows than you 
>>> might expect.
>>> 
>>>  
>>> 
>>> Sean Durity
>>> 
>>>  
>>> 
>>>  
>>> 
>>> From: Ahmed Eljami  
>>> Sent: Wednesday, May 15, 2019 10:56 AM
>>> To: user@cassandra.apache.org
>>> Subject: [EXTERNAL] Two separate rows for the same partition !!
>>> 
>>>  
>>> 
>>> Hi guys,
>>> 
>>>  
>>> 
>>> We have a strange problem with the data in cassandra, after inserting twice 
>>> the same partition with differents columns, we see that cassandra returns 2 
>>> rows on cqlsh rather than one...:
>>> 
>>>  
>>> 
>>> a| b| c| d| f| g| h| i| j| k| l
>>> 
>>> --++---+--+---+-++---+--++
>>> 
>>> |bbb|  rrr| | Token | abcd|| False | 
>>> {'expiration': '1557943260838', 'fname': 'WS', 'freshness': 
>>> '1556299239910'} |   null |   null
>>> 
>>> |bbb|  rrr| | Token | abcd||  null |
>>>  null | 
>>>|   null
>>> 
>>>  
>>> 
>>> With the primary key = PRIMARY KEY ((a, b, c), d, e, f, g)
>>> 
>>>  
>>> 
>>> On the sstable we have the following data:
>>> 
>>>  
>>> 
>>> [
>>>   {
>>> "partition" : {
>>>   "key" : [ "", "bbb", "rrr" ],
>>>   "position" : 3760
>>> },
>>> "rows" : [
>>>   {
>>> "type" : "range_tombstone_bound",
>>> "start" : {
>>>   "type" : "inclusive",
>>>   "clustering" : [ "", "Token", "abcd", "*" ],
>>>   "deletion_info" : { "marked_deleted" : 
>>> "2019-04-26T17:20:39.909Z", "local_delete_time" : "2019-04-26T17:20:39Z" }
>>> }
>>>   },
>>>   {
>>> "type" : "range_tombstone_bound",
>>> "end" : {
>>>   "type" : "exclusive",
>>>   "clustering" : [ "", "Token", "abcd", "" ],
>>>   "deletion_info" : { "marked_deleted" : 
>>> "2019-04-26T17:20:39.909Z", "local_delete_time" : "2019-04-26T17:20:39Z" }
>>> }
>>>   },
>>>   {
>>> "type" : "row",
>>> "position" : 3974,
>>> "clustering" : [ "", "Token", "abcd", "" ],
>>> "liveness_info" : { "tstamp" : "2019-04-26T17:20:39.910Z", "ttl" : 
>>> 31708792, "expires_at" : "2020-04-27T17:20:31Z", "expired" : false },
>>> "cells" : [
>>>   { "name" : "connected", "value" : false },
>>>   { "name" : "dattrib", "deletion_info" : { "marked_deleted" : 
>>> "2019-04-26T17:20:39.90Z", "local_delete_time" : "2019-04-26T17:20:39Z" 
>>> } },
>>>   { "name" : "dattrib", "path" : [ "expiration" ], "value" : 
>>> "1557943260838" },
>>>   { "name" : "dattrib", "path" : [ "fname" ], "value" : "WS" },
>>>   { "name" : "dattrib", "path" : [ "freshness" ], "value" : 
>>> "1556299239910" }
>>> ]
>>>   },
>>>   {
>>> "type" : "row",
>>> "position" : 4123,
>>> "clustering" : [ "", "Token", "abcd", "" ],
>>> "cells" : [
>>>   { "name" : "dvalue", "value" : "", "tstamp" : 
>>> "2019-04-26T17:20:39.910Z", "ttl" : 31708792, "expires_at" : 
>>> "2020-04-27T17:20:31Z", "expired" : false }
>>> ]
>>>   },
>>>   {
>>> "type" : "range_tombstone_bound",
>>> "start" : {
>>>   

Re: [EXTERNAL] Two separate rows for the same partition !!

2019-05-15 Thread Ahmed Eljami
What about this part of the dump:

"type" : "row",
"position" : 4123,
"clustering" : [ "", "Token", "abcd", "" ],
"cells" : [
  { "name" : "dvalue", "value" : "", "tstamp" :
"2019-04-26T17:20:39.910Z", "ttl" : 31708792, "expires_at" :
"2020-04-27T17:20:31Z", "expired" : false }

Why we don't have a *liveness_info* for this row ?

Thanks

Le mer. 15 mai 2019 à 17:40, Ahmed Eljami  a écrit :

> Hi Sean,
> Thanks for reply,
> I'm agree with you about uniquness but when  the output of sstabledump
> show that we have the same value for the column g => "clustering" : [
> "", "Token", "abcd", "" ],
> and when we select with the whole primary key with the valuers wich I see
> in the sstable, cqlsh return 2 rows..
>
> Le mer. 15 mai 2019 à 17:27, Durity, Sean R 
> a écrit :
>
>> Uniqueness is determined by the partition key PLUS the clustering
>> columns. Hard to tell from your data below, but is it possible that one of
>> the clustering columns (perhaps g) has different values? That would easily
>> explain the 2 rows returned – because they ARE different rows in the same
>> partition. In your data model, make sure you need all the clustering
>> columns to determine uniqueness or you will indeed have more rows than you
>> might expect.
>>
>>
>>
>> Sean Durity
>>
>>
>>
>>
>>
>> *From:* Ahmed Eljami 
>> *Sent:* Wednesday, May 15, 2019 10:56 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* [EXTERNAL] Two separate rows for the same partition !!
>>
>>
>>
>> Hi guys,
>>
>>
>>
>> We have a strange problem with the data in cassandra, after inserting
>> twice the same partition with differents columns, we see that cassandra
>> returns 2 rows on cqlsh rather than one...:
>>
>>
>>
>> a| b| c| d| f| g| h| i| j| k| l
>>
>>
>> --++---+--+---+-++---+--++
>>
>> |bbb|  rrr| | Token | abcd|| False |
>> {'expiration': '1557943260838', 'fname': 'WS', 'freshness':
>> '1556299239910'} |   null |   null
>>
>> |bbb|  rrr| | Token | abcd||  null |
>>
>> null ||   null
>>
>>
>>
>> With the primary key = PRIMARY KEY ((a, b, c), d, e, f, g)
>>
>>
>>
>> On the sstable we have the following data:
>>
>>
>>
>> [
>>   {
>> "partition" : {
>>   "key" : [ "", "bbb", "rrr" ],
>>   "position" : 3760
>> },
>> "rows" : [
>>   {
>> "type" : "range_tombstone_bound",
>> "start" : {
>>   "type" : "inclusive",
>>   "clustering" : [ "", "Token", "abcd", "*" ],
>>   "deletion_info" : { "marked_deleted" :
>> "2019-04-26T17:20:39.909Z", "local_delete_time" : "2019-04-26T17:20:39Z" }
>> }
>>   },
>>   {
>> "type" : "range_tombstone_bound",
>> "end" : {
>>   "type" : "exclusive",
>>   "clustering" : [ "", "Token", "abcd", "" ],
>>   "deletion_info" : { "marked_deleted" :
>> "2019-04-26T17:20:39.909Z", "local_delete_time" : "2019-04-26T17:20:39Z" }
>> }
>>   },
>>   {
>> "type" : "row",
>> "position" : 3974,
>> "clustering" : [ "", "Token", "abcd", "" ],
>> "liveness_info" : { "tstamp" : "2019-04-26T17:20:39.910Z", "ttl"
>> : 31708792, "expires_at" : "2020-04-27T17:20:31Z", "expired" : false },
>> "cells" : [
>>   { "name" : "connected", "value" : false },
>>   { "name" : "dattrib", "deletion_info" : { "marked_deleted" :
>> "2019-04-26T17:20:39.90Z", "local_delete_time" : "2019-04-26T17:20:39Z"
>> } },
>>   { "name" : "dattrib", "path" : [ "expiration" ], "value" :
>> "1557943260838" },
>>   { "name" : "dattrib", "path" : [ "fname" ], "value" : "WS" },
>>   { "name" : "dattrib", "path" : [ "freshness" ], "value" :
>> "1556299239910" }
>> ]
>>   },
>>   {
>> "type" : "row",
>> "position" : 4123,
>> "clustering" : [ "", "Token", "abcd", "" ],
>> "cells" : [
>>   { "name" : "dvalue", "value" : "", "tstamp" :
>> "2019-04-26T17:20:39.910Z", "ttl" : 31708792, "expires_at" :
>> "2020-04-27T17:20:31Z", "expired" : false }
>> ]
>>   },
>>   {
>> "type" : "range_tombstone_bound",
>> "start" : {
>>   "type" : "exclusive",
>>   "clustering" : [ "", "Token", "abcd", "" ],
>>   "deletion_info" : { "marked_deleted" :
>> "2019-04-26T17:20:39.909Z", "local_delete_time" : "2019-04-26T17:20:39Z" }
>> }
>>   },
>>   {
>> "type" : "range_tombstone_bound",
>> "end" : {
>>   "type" : "inclusive",
>>   "clustering" : [ "", "Token", "abcd", "*" ],
>>   "deletion_info" : { "marked_deleted" :
>> "2019-04-26T17:20:39.909Z", "local_delete_time" 

Re: [EXTERNAL] Two separate rows for the same partition !!

2019-05-15 Thread Ahmed Eljami
Hi Sean,
Thanks for reply,
I'm agree with you about uniquness but when  the output of sstabledump show
that we have the same value for the column g => "clustering" : [ "",
"Token", "abcd", "" ],
and when we select with the whole primary key with the valuers wich I see
in the sstable, cqlsh return 2 rows..

Le mer. 15 mai 2019 à 17:27, Durity, Sean R  a
écrit :

> Uniqueness is determined by the partition key PLUS the clustering columns.
> Hard to tell from your data below, but is it possible that one of the
> clustering columns (perhaps g) has different values? That would easily
> explain the 2 rows returned – because they ARE different rows in the same
> partition. In your data model, make sure you need all the clustering
> columns to determine uniqueness or you will indeed have more rows than you
> might expect.
>
>
>
> Sean Durity
>
>
>
>
>
> *From:* Ahmed Eljami 
> *Sent:* Wednesday, May 15, 2019 10:56 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Two separate rows for the same partition !!
>
>
>
> Hi guys,
>
>
>
> We have a strange problem with the data in cassandra, after inserting
> twice the same partition with differents columns, we see that cassandra
> returns 2 rows on cqlsh rather than one...:
>
>
>
> a| b| c| d| f| g| h| i| j| k| l
>
>
> --++---+--+---+-++---+--++
>
> |bbb|  rrr| | Token | abcd|| False |
> {'expiration': '1557943260838', 'fname': 'WS', 'freshness':
> '1556299239910'} |   null |   null
>
> |bbb|  rrr| | Token | abcd||  null |
>
> null ||   null
>
>
>
> With the primary key = PRIMARY KEY ((a, b, c), d, e, f, g)
>
>
>
> On the sstable we have the following data:
>
>
>
> [
>   {
> "partition" : {
>   "key" : [ "", "bbb", "rrr" ],
>   "position" : 3760
> },
> "rows" : [
>   {
> "type" : "range_tombstone_bound",
> "start" : {
>   "type" : "inclusive",
>   "clustering" : [ "", "Token", "abcd", "*" ],
>   "deletion_info" : { "marked_deleted" :
> "2019-04-26T17:20:39.909Z", "local_delete_time" : "2019-04-26T17:20:39Z" }
> }
>   },
>   {
> "type" : "range_tombstone_bound",
> "end" : {
>   "type" : "exclusive",
>   "clustering" : [ "", "Token", "abcd", "" ],
>   "deletion_info" : { "marked_deleted" :
> "2019-04-26T17:20:39.909Z", "local_delete_time" : "2019-04-26T17:20:39Z" }
> }
>   },
>   {
> "type" : "row",
> "position" : 3974,
> "clustering" : [ "", "Token", "abcd", "" ],
> "liveness_info" : { "tstamp" : "2019-04-26T17:20:39.910Z", "ttl" :
> 31708792, "expires_at" : "2020-04-27T17:20:31Z", "expired" : false },
> "cells" : [
>   { "name" : "connected", "value" : false },
>   { "name" : "dattrib", "deletion_info" : { "marked_deleted" :
> "2019-04-26T17:20:39.90Z", "local_delete_time" : "2019-04-26T17:20:39Z"
> } },
>   { "name" : "dattrib", "path" : [ "expiration" ], "value" :
> "1557943260838" },
>   { "name" : "dattrib", "path" : [ "fname" ], "value" : "WS" },
>   { "name" : "dattrib", "path" : [ "freshness" ], "value" :
> "1556299239910" }
> ]
>   },
>   {
> "type" : "row",
> "position" : 4123,
> "clustering" : [ "", "Token", "abcd", "" ],
> "cells" : [
>   { "name" : "dvalue", "value" : "", "tstamp" :
> "2019-04-26T17:20:39.910Z", "ttl" : 31708792, "expires_at" :
> "2020-04-27T17:20:31Z", "expired" : false }
> ]
>   },
>   {
> "type" : "range_tombstone_bound",
> "start" : {
>   "type" : "exclusive",
>   "clustering" : [ "", "Token", "abcd", "" ],
>   "deletion_info" : { "marked_deleted" :
> "2019-04-26T17:20:39.909Z", "local_delete_time" : "2019-04-26T17:20:39Z" }
> }
>   },
>   {
> "type" : "range_tombstone_bound",
> "end" : {
>   "type" : "inclusive",
>   "clustering" : [ "", "Token", "abcd", "*" ],
>   "deletion_info" : { "marked_deleted" :
> "2019-04-26T17:20:39.909Z", "local_delete_time" : "2019-04-26T17:20:39Z" }
> }
>   }
> ]
>   }
>
>
>
> what's weired that the two rows with "position" : 3974, and  "position" :
> 4123 should be on the same row...!!
>
> Since, we can't reproduce the issue ...
>
>
>
> Any idea please ?
>
> Thanks.
>
>
>
> --
>
> Cordialement;
>
> Ahmed ELJAMI
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, 

RE: [EXTERNAL] Two separate rows for the same partition !!

2019-05-15 Thread Durity, Sean R
Uniqueness is determined by the partition key PLUS the clustering columns. Hard 
to tell from your data below, but is it possible that one of the clustering 
columns (perhaps g) has different values? That would easily explain the 2 rows 
returned – because they ARE different rows in the same partition. In your data 
model, make sure you need all the clustering columns to determine uniqueness or 
you will indeed have more rows than you might expect.

Sean Durity


From: Ahmed Eljami 
Sent: Wednesday, May 15, 2019 10:56 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Two separate rows for the same partition !!

Hi guys,

We have a strange problem with the data in cassandra, after inserting twice the 
same partition with differents columns, we see that cassandra returns 2 rows on 
cqlsh rather than one...:

a| b| c| d| f| g| h| i| j| k| l

--++---+--+---+-++---+--++

|bbb|  rrr| | Token | abcd|| False | 
{'expiration': '1557943260838', 'fname': 'WS', 'freshness': '1556299239910'} |  
 null |   null

|bbb|  rrr| | Token | abcd||  null |
 null |
|   null

With the primary key = PRIMARY KEY ((a, b, c), d, e, f, g)

On the sstable we have the following data:

[
  {
"partition" : {
  "key" : [ "", "bbb", "rrr" ],
  "position" : 3760
},
"rows" : [
  {
"type" : "range_tombstone_bound",
"start" : {
  "type" : "inclusive",
  "clustering" : [ "", "Token", "abcd", "*" ],
  "deletion_info" : { "marked_deleted" : "2019-04-26T17:20:39.909Z", 
"local_delete_time" : "2019-04-26T17:20:39Z" }
}
  },
  {
"type" : "range_tombstone_bound",
"end" : {
  "type" : "exclusive",
  "clustering" : [ "", "Token", "abcd", "" ],
  "deletion_info" : { "marked_deleted" : "2019-04-26T17:20:39.909Z", 
"local_delete_time" : "2019-04-26T17:20:39Z" }
}
  },
  {
"type" : "row",
"position" : 3974,
"clustering" : [ "", "Token", "abcd", "" ],
"liveness_info" : { "tstamp" : "2019-04-26T17:20:39.910Z", "ttl" : 
31708792, "expires_at" : "2020-04-27T17:20:31Z", "expired" : false },
"cells" : [
  { "name" : "connected", "value" : false },
  { "name" : "dattrib", "deletion_info" : { "marked_deleted" : 
"2019-04-26T17:20:39.90Z", "local_delete_time" : "2019-04-26T17:20:39Z" } },
  { "name" : "dattrib", "path" : [ "expiration" ], "value" : 
"1557943260838" },
  { "name" : "dattrib", "path" : [ "fname" ], "value" : "WS" },
  { "name" : "dattrib", "path" : [ "freshness" ], "value" : 
"1556299239910" }
]
  },
  {
"type" : "row",
"position" : 4123,
"clustering" : [ "", "Token", "abcd", "" ],
"cells" : [
  { "name" : "dvalue", "value" : "", "tstamp" : 
"2019-04-26T17:20:39.910Z", "ttl" : 31708792, "expires_at" : 
"2020-04-27T17:20:31Z", "expired" : false }
]
  },
  {
"type" : "range_tombstone_bound",
"start" : {
  "type" : "exclusive",
  "clustering" : [ "", "Token", "abcd", "" ],
  "deletion_info" : { "marked_deleted" : "2019-04-26T17:20:39.909Z", 
"local_delete_time" : "2019-04-26T17:20:39Z" }
}
  },
  {
"type" : "range_tombstone_bound",
"end" : {
  "type" : "inclusive",
  "clustering" : [ "", "Token", "abcd", "*" ],
  "deletion_info" : { "marked_deleted" : "2019-04-26T17:20:39.909Z", 
"local_delete_time" : "2019-04-26T17:20:39Z" }
}
  }
]
  }

what's weired that the two rows with "position" : 3974, and  "position" : 4123 
should be on the same row...!!
Since, we can't reproduce the issue ...

Any idea please ?
Thanks.

--
Cordialement;
Ahmed ELJAMI



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses,