Re: duplicate rows for partition

2018-08-22 Thread James Shaw
 can you run this:
select associate_degree, writetime( associate_degree ) from user_data where


Thanks,

James

On Wed, Aug 22, 2018 at 7:13 PM, James Shaw  wrote:

> can you run this:
> select writetime( associate_degree ) from user_data where 
> see what are writetime
>
> On Wed, Aug 22, 2018 at 7:03 PM, James Shaw  wrote:
>
>> interesting. what are insert statement and select statement ?
>>
>> Thanks,
>>
>> James
>>
>> On Wed, Aug 22, 2018 at 6:55 PM, Gosar M 
>> wrote:
>>
>>> CREATE TABLE user_data (
>>> "userid" text,
>>> "secondaryid" text,
>>> "tDate" timestamp,
>>> "tid3" text,
>>> "sid4" text,
>>> "pid5" text,
>>> associate_degree text
>>>   PRIMARY KEY (("userid", "secondaryid"),"tDate", "tid3", "sid4",
>>> "pid5")
>>>   WITH CLUSTERING ORDER BY ("tDate" ASC, "tid3" ASC, "sid4" ASC, "pid5"
>>> ASC)
>>>
>>>
>>>
>>> On Wednesday, 22 August 2018, 15:08:03 GMT-7,
>>> dinesh.jo...@yahoo.com.INVALID  wrote:
>>>
>>>
>>> What is the schema of the table? Could your include the output of
>>> DESCRIBE?
>>>
>>> Dinesh
>>>
>>>
>>> On Wednesday, August 22, 2018, 2:22:31 PM PDT, Gosar M
>>>  wrote:
>>>
>>>
>>> Hello,
>>>
>>> Have a table with following partition and clustering keys
>>>
>>> partition key - ("userid", "secondaryid"),
>>> clustering key - "tDate", "tid3", "sid4", "pid5"
>>>
>>> Data is inserted based on above partition and clustering key. For 1
>>> record seeing 2 rows returned when queried by both partition and clustering
>>> key.
>>>
>>>
>>>  userid  | secondaryid  | tdate   | tid3  | sid4
>>> | pid5| associate_degree
>>>  --+
>>> -+
>>>   090sdfdsf898 | ab984564 | 2018-08-04 07:59:59+ | 0a5995672e3 | l34
>>> | l34_listing |   123145979615694
>>>   090sdfdsf898 | ab984564 | 2018-08-04 07:59:59+ | 0a5995672e3 | l34
>>> | l34_listing |   123145979615694989
>>>
>>>
>>> We did not had any node which was down longer than gc_grace_period.
>>>
>>>
>>> Thank you.
>>>
>>
>>
>


Re: duplicate rows for partition

2018-08-22 Thread James Shaw
can you run this:
select writetime( associate_degree ) from user_data where 
see what are writetime

On Wed, Aug 22, 2018 at 7:03 PM, James Shaw  wrote:

> interesting. what are insert statement and select statement ?
>
> Thanks,
>
> James
>
> On Wed, Aug 22, 2018 at 6:55 PM, Gosar M 
> wrote:
>
>> CREATE TABLE user_data (
>> "userid" text,
>> "secondaryid" text,
>> "tDate" timestamp,
>> "tid3" text,
>> "sid4" text,
>> "pid5" text,
>> associate_degree text
>>   PRIMARY KEY (("userid", "secondaryid"),"tDate", "tid3", "sid4", "pid5")
>>   WITH CLUSTERING ORDER BY ("tDate" ASC, "tid3" ASC, "sid4" ASC, "pid5"
>> ASC)
>>
>>
>>
>> On Wednesday, 22 August 2018, 15:08:03 GMT-7,
>> dinesh.jo...@yahoo.com.INVALID  wrote:
>>
>>
>> What is the schema of the table? Could your include the output of
>> DESCRIBE?
>>
>> Dinesh
>>
>>
>> On Wednesday, August 22, 2018, 2:22:31 PM PDT, Gosar M
>>  wrote:
>>
>>
>> Hello,
>>
>> Have a table with following partition and clustering keys
>>
>> partition key - ("userid", "secondaryid"),
>> clustering key - "tDate", "tid3", "sid4", "pid5"
>>
>> Data is inserted based on above partition and clustering key. For 1
>> record seeing 2 rows returned when queried by both partition and clustering
>> key.
>>
>>
>>  userid  | secondaryid  | tdate   | tid3  | sid4
>> | pid5| associate_degree
>>  --+
>> -+
>>   090sdfdsf898 | ab984564 | 2018-08-04 07:59:59+ | 0a5995672e3 | l34
>> | l34_listing |   123145979615694
>>   090sdfdsf898 | ab984564 | 2018-08-04 07:59:59+ | 0a5995672e3 | l34
>> | l34_listing |   123145979615694989
>>
>>
>> We did not had any node which was down longer than gc_grace_period.
>>
>>
>> Thank you.
>>
>
>


Re: duplicate rows for partition

2018-08-22 Thread James Shaw
interesting. what are insert statement and select statement ?

Thanks,

James

On Wed, Aug 22, 2018 at 6:55 PM, Gosar M 
wrote:

> CREATE TABLE user_data (
> "userid" text,
> "secondaryid" text,
> "tDate" timestamp,
> "tid3" text,
> "sid4" text,
> "pid5" text,
> associate_degree text
>   PRIMARY KEY (("userid", "secondaryid"),"tDate", "tid3", "sid4", "pid5")
>   WITH CLUSTERING ORDER BY ("tDate" ASC, "tid3" ASC, "sid4" ASC, "pid5"
> ASC)
>
>
>
> On Wednesday, 22 August 2018, 15:08:03 GMT-7,
> dinesh.jo...@yahoo.com.INVALID  wrote:
>
>
> What is the schema of the table? Could your include the output of DESCRIBE?
>
> Dinesh
>
>
> On Wednesday, August 22, 2018, 2:22:31 PM PDT, Gosar M
>  wrote:
>
>
> Hello,
>
> Have a table with following partition and clustering keys
>
> partition key - ("userid", "secondaryid"),
> clustering key - "tDate", "tid3", "sid4", "pid5"
>
> Data is inserted based on above partition and clustering key. For 1 record
> seeing 2 rows returned when queried by both partition and clustering key.
>
>
>  userid  | secondaryid  | tdate   | tid3  | sid4 |
> pid5| associate_degree
>  --+
> -+
>   090sdfdsf898 | ab984564 | 2018-08-04 07:59:59+ | 0a5995672e3 | l34 |
> l34_listing |   123145979615694
>   090sdfdsf898 | ab984564 | 2018-08-04 07:59:59+ | 0a5995672e3 | l34 |
> l34_listing |   123145979615694989
>
>
> We did not had any node which was down longer than gc_grace_period.
>
>
> Thank you.
>


Re: duplicate rows for partition

2018-08-22 Thread Gosar M
CREATE TABLE user_data (
    "userid" text,
    "secondaryid" text,
    "tDate" timestamp,
    "tid3" text,
    "sid4" text,
    "pid5" text,
    associate_degree text
  PRIMARY KEY (("userid", "secondaryid"),"tDate", "tid3", "sid4", "pid5") 
  WITH CLUSTERING ORDER BY ("tDate" ASC, "tid3" ASC, "sid4" ASC, "pid5" ASC)



   On Wednesday, 22 August 2018, 15:08:03 GMT-7, dinesh.jo...@yahoo.com.INVALID 
 wrote:  
 
 What is the schema of the table? Could your include the output of DESCRIBE?
Dinesh 

On Wednesday, August 22, 2018, 2:22:31 PM PDT, Gosar M 
 wrote:  
 
 Hello,
Have a table with following partition and clustering keys
partition key - ("userid", "secondaryid"), 
clustering key - "tDate", "tid3", "sid4", "pid5"
Data is inserted based on above partition and clustering key. For 1 record 
seeing 2 rows returned when queried by both partition and clustering key.

  userid  | secondaryid  | tdate   | tid3  | sid4 | 
pid5    | associate_degree
 
--+-+
 
  090sdfdsf898 | ab984564 | 2018-08-04 07:59:59+ | 0a5995672e3 | l34 | 
l34_listing |   123145979615694 
  090sdfdsf898 | ab984564 | 2018-08-04 07:59:59+ | 0a5995672e3 | l34 | 
l34_listing |   123145979615694989

We did not had any node which was down longer than gc_grace_period. 


Thank you. 


Re: duplicate rows for partition

2018-08-22 Thread dinesh.jo...@yahoo.com.INVALID
What is the schema of the table? Could your include the output of DESCRIBE?
Dinesh 

On Wednesday, August 22, 2018, 2:22:31 PM PDT, Gosar M 
 wrote:  
 
 Hello,
Have a table with following partition and clustering keys
partition key - ("userid", "secondaryid"), 
clustering key - "tDate", "tid3", "sid4", "pid5"
Data is inserted based on above partition and clustering key. For 1 record 
seeing 2 rows returned when queried by both partition and clustering key.

  userid  | secondaryid  | tdate   | tid3  | sid4 | 
pid5    | associate_degree
 
--+-+
 
  090sdfdsf898 | ab984564 | 2018-08-04 07:59:59+ | 0a5995672e3 | l34 | 
l34_listing |   123145979615694 
  090sdfdsf898 | ab984564 | 2018-08-04 07:59:59+ | 0a5995672e3 | l34 | 
l34_listing |   123145979615694989

We did not had any node which was down longer than gc_grace_period. 


Thank you. 
  

duplicate rows for partition

2018-08-22 Thread Gosar M
Hello,
Have a table with following partition and clustering keys
partition key - ("userid", "secondaryid"), 
clustering key - "tDate", "tid3", "sid4", "pid5"
Data is inserted based on above partition and clustering key. For 1 record 
seeing 2 rows returned when queried by both partition and clustering key.

  userid  | secondaryid  | tdate   | tid3  | sid4 | 
pid5    | associate_degree
 
--+-+
 
  090sdfdsf898 | ab984564 | 2018-08-04 07:59:59+ | 0a5995672e3 | l34 | 
l34_listing |   123145979615694 
  090sdfdsf898 | ab984564 | 2018-08-04 07:59:59+ | 0a5995672e3 | l34 | 
l34_listing |   123145979615694989

We did not had any node which was down longer than gc_grace_period. 


Thank you. 


Re: Work in Progress - Awesome Cassandra Resources w/ Outline

2018-08-22 Thread Patrick Goliwas
If there is an interest within this thread and the community to include 
information about a growing segment of Cassandra-related tools for 
backup/recovery and data mobility, let me know and I can forward documentation 
for you to include in your guide. 

Sent via mobile
Call/Text: (612) 281-0702

On Aug 22, 2018, at 2:27 PM, Rahul Singh  wrote:

Horia,

Thanks! I added the links. I can look into contributing this to the blog.. I 
mainly curate this list to be a definitive guide to eventually all cassandra 
related things, which would include Datastax, Scylla, Yugabyte, Cosmos, etc. 
which may ore may not be related to Apache Cassandra per se. 


Thanks for the suggestion! 

Rahul 
> On Aug 9, 2018, 3:55 AM -0500, Horia Mocioi , 
> wrote:
> Hello Rahul,
> 
> Great compilation of resources.
> 
> Maybe add this one on the Blogs category? https://lostechies.com/ryansv
> ihla/tags
> 
> This one is also quite good, I would say https://academy.datastax.com/s
> upport-blog/deeper-dive-diagnosing-dse-performance-issues-ttop-and-
> multidump
> 
> And since now there is a official blog, wouldn't be good to have this
> resources there?
> 
> Regards,
> Horia
> 
>> On ons, 2018-08-08 at 07:14 -0400, Rahul Singh wrote:
>> Folks, 
>> 
>> I've cleaned up the awesome-cassandra README which I've been working
>> on and published it as a github page. The goal is to make an
>> authoritative list of resources that sourced from the community. 
>> 
>> https://anant.github.io/awesome-cassandra
>> 
>> TLDR:
>> 1. Work to be done: organizing more posts into the current outline so
>> that it's a logical organization of subject areas. e.g. all the blog
>> posts related to sstable management vs. shit related to hints. 
>> 
>> 2. Looking for:  sources specially from this community in the form of
>> existing blogs w/ multiple posts whether from individuals or
>> companies - so either make a pull request or just submit to me
>> directly - via email or issue. 
>> 
>> Thanks, 
>> 
>> https://anant.github.io/awesome-cassandra
>> 
>> 
>> 
>> I'm still working on the searchable index w/ facets ... but that's
>> also a parallel work in progress. 
>> 
>> 
>> 
>> Make it a great week, 
>> Rahul 
>> Т�ХF�V�7V'67&��R���âW6W"�V�7V'67&��G&�6�R��Фf�"FF�F6G2�R���âW6W"ֆV�676�G&�6�R��


Re: Work in Progress - Awesome Cassandra Resources w/ Outline

2018-08-22 Thread Rahul Singh
Horia,

Thanks! I added the links. I can look into contributing this to the blog.. I 
mainly curate this list to be a definitive guide to eventually all cassandra 
related things, which would include Datastax, Scylla, Yugabyte, Cosmos, etc. 
which may ore may not be related to Apache Cassandra per se.


Thanks for the suggestion!

Rahul
On Aug 9, 2018, 3:55 AM -0500, Horia Mocioi , wrote:
> Hello Rahul,
>
> Great compilation of resources.
>
> Maybe add this one on the Blogs category? https://lostechies.com/ryansv
> ihla/tags
>
> This one is also quite good, I would say https://academy.datastax.com/s
> upport-blog/deeper-dive-diagnosing-dse-performance-issues-ttop-and-
> multidump
>
> And since now there is a official blog, wouldn't be good to have this
> resources there?
>
> Regards,
> Horia
>
> On ons, 2018-08-08 at 07:14 -0400, Rahul Singh wrote:
> > Folks,
> >
> > I've cleaned up the awesome-cassandra README which I've been working
> > on and published it as a github page. The goal is to make an
> > authoritative list of resources that sourced from the community.
> >
> > https://anant.github.io/awesome-cassandra
> >
> > TLDR:
> > 1. Work to be done: organizing more posts into the current outline so
> > that it's a logical organization of subject areas. e.g. all the blog
> > posts related to sstable management vs. shit related to hints.
> >
> > 2. Looking for:  sources specially from this community in the form of
> > existing blogs w/ multiple posts whether from individuals or
> > companies - so either make a pull request or just submit to me
> > directly - via email or issue.
> >
> > Thanks,
> >
> > https://anant.github.io/awesome-cassandra
> >
> >
> >
> > I'm still working on the searchable index w/ facets ... but that's
> > also a parallel work in progress.
> >
> >
> >
> > Make it a great week,
> > Rahul 
> > Т�ХF�V�7V'67&��R���âW6W"�V�7V'67&��G&�6�R��Фf�"FF�F6G2�R���âW6W"ֆV�676�G&�6�R��


Re: JBOD disk failure - just say no

2018-08-22 Thread Jonathan Haddad
We recently helped a team deal with some JBOD issues, they can be quite
painful, and the experience depends a bit on the C* version in use.  We
wrote a blog post about it (published today):

http://thelastpickle.com/blog/2018/08/22/the-fine-print-when-using-multiple-data-directories.html

Hope this helps.

Jon

On Mon, Aug 20, 2018 at 5:49 PM James Briggs 
wrote:

> Cassandra JBOD has a bunch of issues, so I don't recommend it for
> production:
>
> 1) disks fill up with load (data) unevenly, meaning you can run out on a
> disk while some are half-full
> 2) one bad disk can take out the whole node
> 3) instead of a small failure probability on an LVM/RAID volume, with JBOD
> you end up near 100% chance of failure after 3 years or so.
> 4) generally you will not have enough warning of a looming failure with
> JBOD compared to LVM/RAID. (Some
> companies take a week or two to replace a failed disk.)
>
> JBOD is easy to setup, but hard to manage.
>
> Thanks, James.
>
>
>
> --
> *From:* kurt greaves 
> *To:* User 
> *Sent:* Friday, August 17, 2018 5:42 AM
> *Subject:* Re: JBOD disk failure
>
> As far as I'm aware, yes. I recall hearing someone mention tying system
> tables to a particular disk but at the moment that doesn't exist.
>
> On Fri., 17 Aug. 2018, 01:04 Eric Evans, 
> wrote:
>
> On Wed, Aug 15, 2018 at 3:23 AM kurt greaves  wrote:
> > Yep. It might require a full node replace depending on what data is lost
> from the system tables. In some cases you might be able to recover from
> partially lost system info, but it's not a sure thing.
>
> Ugh, does it really just boil down to what part of `system` happens to
> be on the disk in question?  In my mind, that makes the only sane
> operational procedure for a failed disk to be: "replace the entire
> node".  IOW, I don't think we can realistically claim you can survive
> a failed a JBOD device if it relies on happenstance.
>
> > On Wed., 15 Aug. 2018, 17:55 Christian Lorenz, <
> christian.lor...@webtrekk.com > wrote:
> >>
> >> Thank you for the answers. We are using the current version 3.11.3 So
> this one includes CASSANDRA-6696.
> >>
> >> So if I get this right, losing system tables will need a full node
> rebuild. Otherwise repair will get the node consistent again.
> >
> > [ ... ]
>
> --
> Eric Evans
> john.eric.ev...@gmail.com
>
> -- -- -
> To unsubscribe, e-mail: user-unsubscribe@cassandra. apache.org
> 
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>
>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


Re: [Cassandra] nodetool compactionstats not showing pending task.

2018-08-22 Thread Oleksandr Shulgin
On Fri, May 5, 2017 at 1:20 PM Alain RODRIGUEZ  wrote:

> Sorry to hear the restart did not help.
>

Hi,

We are hitting the same issue since a few weeks on version 3.0.16.
Normally, restarting an affected node helps, but this is something we would
like to avoid doing.

What makes it worse for us is that Cassandra Reaper stops scheduling new
repair jobs if it sees that the node have more than 20 pending compaction
tasks.  We could bump this threshold, but in general the estimate could be
more accurate (or the actual tasks should be started timely).

Maybe try to monitor through JMX with
'org.apache.cassandra.db:type=CompactionManager',
>> attribute 'Compactions' or 'CompactionsSummary'
>
>
> What is this attribute showing?
>

For example, I have right now a node showing "pending tasks: 16" and no
compaction running.  Here is the JMX output (well, via Jolokia):

"Compactions": [],
"CoreCompactorThreads": 1,
"CompactionSummary": [],
"MaximumCompactorThreads": 1,
"CoreValidationThreads": 1,
"MaximumValidatorThreads": 2147483647,

Here is the Apache Cassandra Jira:
> https://issues.apache.org/jira/browse/CASSANDRA. You search here
> 
>  (
> https://issues.apache.org/jira/browse/CASSANDRA-12529?jql=project%20%3D%20CASSANDRA%20AND%20text%20~%20%22pending%20compactions%22%20ORDER%20BY%20created%20DESC),
> for example.
>

I believe this is a separate issue.  There some actual compaction tasks are
running, but not making progress.  And we never TRUNCATEd our tables, and
for sure not recently.

Any more pointers on how to debug this?

Regards,
--
Alex