Re: Practical limit on number of column families

2016-02-29 Thread Jack Krupansky
3,000 entries? What's an "entry"? Do you mean row, column, or... what?

You are using the obsolete terminology of CQL2 and Thrift - column family.
With CQL3 you should be creating "tables". The practical recommendation of
an upper limit of a few hundred tables across all key spaces remains.

Technically you can go higher and technically you can reduce the overhead
per table (an undocumented Jira - intentionally undocumented since it is
strongly not recommended), but... it is unlikely that you will be happy
with the results.

What is the nature of the use case?

You basically have two choices: an additional cluster column to distinguish
categories of table, or separate clusters for each few hundred of tables.


-- Jack Krupansky

On Mon, Feb 29, 2016 at 12:30 PM, Fernando Jimenez <
fernando.jime...@wealth-port.com> wrote:

> Hi all
>
> I have a use case for Cassandra that would require creating a large number
> of column families. I have found references to early versions of Cassandra
> where each column family would require a fixed amount of memory on all
> nodes, effectively imposing an upper limit on the total number of CFs. I
> have also seen rumblings that this may have been fixed in later versions.
>
> To put the question to rest, I have setup a DSE sandbox and created some
> code to generate column families populated with 3,000 entries each.
>
> Unfortunately I have now hit this issue:
> https://issues.apache.org/jira/browse/CASSANDRA-9291
>
> So I will have to retest against Cassandra 3.0 instead
>
> However, I would like to understand the limitations regarding creation of
> column families.
>
> * Is there a practical upper limit?
> * is this a fixed limit, or does it scale as more nodes are added into the
> cluster?
> * Is there a difference between one keyspace with thousands of column
> families, vs thousands of keyspaces with only a few column families each?
>
> I haven’t found any hard evidence/documentation to help me here, but if
> you can point me in the right direction, I will oblige and RTFM away.
>
> Many thanks for your help!
>
> Cheers
> FJ
>
>
>


Re: Too many sstables with DateTieredCompactionStrategy

2016-02-29 Thread Noorul Islam K M
Lyubo Kamenov  writes:

> Maybe increase the number of tables that can be compacted by minor
> compactions[1],
> i.e. max_threshold (default is set to 32).
>
> 1.
> https://docs.datastax.com/en/cql/3.1/cql/cql_reference/compactSubprop.html?scroll=compactSubprop__compactionSubpropertiesDTCS
>

I see that after the timestamp_resolution change to MICROSECONDS, slowly
the count is decreasing. I will give it some more time and if not
helping I will try putting higher value for max_threshold.

Thanks and Regards
Noorul

> On Mon, Feb 29, 2016 at 9:28 PM, Noorul Islam Kamal Malmiyoda <
> noo...@noorul.com> wrote:
>
>> Hello Marcus,
>>
>> I altered the table to set timestamp_resolution to 'MICROSECONDS'. I
>> waited for sometime, but the sstable count did not come down. Do you
>> think I should specific command to reduce the count of sstables after
>> setting this?
>>
>> Thanks and Regards
>> Noorul
>>
>>
>> On Mon, Feb 29, 2016 at 7:22 PM, Marcus Eriksson 
>> wrote:
>> > why do you have 'timestamp_resolution': 'MILLISECONDS'? It should be
>> left as
>> > default (MICROSECONDS) unless you do "USING TIMESTAMP
>> > "-inserts, see
>> > https://issues.apache.org/jira/browse/CASSANDRA-11041
>> >
>> > On Mon, Feb 29, 2016 at 2:36 PM, Noorul Islam K M 
>> wrote:
>> >>
>> >>
>> >> Hi all,
>> >>
>> >> We are using below compaction settings for a table
>> >>
>> >> compaction = {'timestamp_resolution': 'MILLISECONDS',
>> >> 'max_sstable_age_days': '365', 'base_time_seconds': '60', 'class':
>> >> 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy'}
>> >>
>> >> But it is creating too many sstables. Currently number of sstables
>> >> is 4. We have been injecting data for the last three days.
>> >>
>> >> We have set the compactionthroughput to 128 MB/s
>> >>
>> >> $ nodetool getcompactionthroughput
>> >>
>> >> Current compaction throughput: 128 MB/s
>> >>
>> >> But this is not helping.
>> >>
>> >> How can we control the number of sstables in this case?
>> >>
>> >> Thanks and Regards
>> >> Noorul
>> >
>> >
>>


Re: Too many sstables with DateTieredCompactionStrategy

2016-02-29 Thread Lyubo Kamenov
Maybe increase the number of tables that can be compacted by minor
compactions[1],
i.e. max_threshold (default is set to 32).

1.
https://docs.datastax.com/en/cql/3.1/cql/cql_reference/compactSubprop.html?scroll=compactSubprop__compactionSubpropertiesDTCS

On Mon, Feb 29, 2016 at 9:28 PM, Noorul Islam Kamal Malmiyoda <
noo...@noorul.com> wrote:

> Hello Marcus,
>
> I altered the table to set timestamp_resolution to 'MICROSECONDS'. I
> waited for sometime, but the sstable count did not come down. Do you
> think I should specific command to reduce the count of sstables after
> setting this?
>
> Thanks and Regards
> Noorul
>
>
> On Mon, Feb 29, 2016 at 7:22 PM, Marcus Eriksson 
> wrote:
> > why do you have 'timestamp_resolution': 'MILLISECONDS'? It should be
> left as
> > default (MICROSECONDS) unless you do "USING TIMESTAMP
> > "-inserts, see
> > https://issues.apache.org/jira/browse/CASSANDRA-11041
> >
> > On Mon, Feb 29, 2016 at 2:36 PM, Noorul Islam K M 
> wrote:
> >>
> >>
> >> Hi all,
> >>
> >> We are using below compaction settings for a table
> >>
> >> compaction = {'timestamp_resolution': 'MILLISECONDS',
> >> 'max_sstable_age_days': '365', 'base_time_seconds': '60', 'class':
> >> 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy'}
> >>
> >> But it is creating too many sstables. Currently number of sstables
> >> is 4. We have been injecting data for the last three days.
> >>
> >> We have set the compactionthroughput to 128 MB/s
> >>
> >> $ nodetool getcompactionthroughput
> >>
> >> Current compaction throughput: 128 MB/s
> >>
> >> But this is not helping.
> >>
> >> How can we control the number of sstables in this case?
> >>
> >> Thanks and Regards
> >> Noorul
> >
> >
>


Re: how to read parent_repair_history table?

2016-02-29 Thread Jimmy Lin
is there any other better way to find out a node's token range?  I see
systems.peers column family seems to include range information, so that is
promising but when I look at both datastax java driver and python driver,
its API both require a keyspace name and host name, I wonder why ?


http://docs.datastax.com/en/drivers/java/2.1/com/datastax/driver/core/Metadata.html#getTokenRanges-java.lang.String-com.datastax.driver.core.Host-


And just to be sure, the participants column in the repair_history table
represented the node being repaired and not the node being used to
comparing the data, correct?


On Thu, Feb 25, 2016 at 1:38 PM, Paulo Motta 
wrote:

> > how does it work when repair job targeting only local vs all DC? is
> there any columns or flag i can tell the difference? or does it actualy
> matter?
>
> You can not easily find out from the parent_repair_session table if a
> repair is local-only or multi-dc. I created
> https://issues.apache.org/jira/browse/CASSANDRA-11244 to add more
> information to that table. Since that table only has id as primary key,
> you'd need to do a full scan to perform checks on it, or keep track of the
> parent id session when submitting the repair and query by primary key.
>
> What you could probably do to health check your nodes are repaired on time
> is to check for each table:
>
> select * from repair_history where keyspace = 'ks' columnfamily_name =
> 'cf' and id > mintimeuuid(now() - gc_grace_seconds/2);
>
> And then verify for each node if all of its ranges have been repaired in
> this period, and send an alert otherwise. You can find out a nodes range by
> querying JMX via StorageServiceMBean.getRangeToEndpointMap.
>
> To make this task a bit simpler you could probably add a secondary index
> to the participants column of repair_history table with:
>
> CREATE INDEX myindex ON system_distributed.repair_history (participants) ;
>
> and check each node status individually with:
>
> select * from repair_history where keyspace = 'ks' columnfamily_name =
> 'cf' and id > mintimeuuid(now() - gc_grace_seconds/2) AND participants
> CONTAINS 'node_IP';
>
>
>
> 2016-02-25 16:22 GMT-03:00 Jimmy Lin :
>
>> hi Paulo,
>>
>> one more follow up ... :)
>>
>>  I noticed these tables are suppose to replicatd to all nodes in the 
>> cluster, and it is not per node specific.
>>
>> how does it work when repair job targeting only local vs all DC? is there 
>> any columns or flag i can tell the difference?
>> or does it actualy matter?
>>
>>  thanks
>>
>>
>>
>>
>> Sent from my iPhone
>>
>> On Feb 25, 2016, at 10:37 AM, Paulo Motta 
>> wrote:
>>
>> > why each job repair execution will have 2 entries? I thought it will
>> be one entry, begining with started_at column filled, and when it
>> completed, finished_at column will be filled.
>>
>> that's correct, I was mistaken!
>>
>> > Also, if my cluster has more than 1 keyspace, and the way this table
>> is structured, it will have multiple entries, one for each keysapce_name
>> value. no ? thanks
>>
>> right, because repair sessions in different keyspaces will have different
>> repair session ids.
>>
>> 2016-02-25 15:04 GMT-03:00 Jimmy Lin :
>>
>>> hi Paulo,
>>>
>>> follow up on the # of entries question...
>>>
>>>  why each job repair execution will have 2 entries?
>>> I thought it will be one entry, begining with started_at column filled, and 
>>> when it completed, finished_at column will be filled.
>>>
>>> Also, if my cluster has more than 1 keyspace, and the way this table is 
>>> structured, it will have multiple entries, one for each keysapce_name 
>>> value. no ?
>>>
>>> thanks
>>>
>>>
>>>
>>> Sent from my iPhone
>>>
>>> On Feb 25, 2016, at 5:48 AM, Paulo Motta 
>>> wrote:
>>>
>>> Hello Jimmy,
>>>
>>> The parent_repair_history table keeps track of start and finish
>>> information of a repair session.  The other table repair_history keeps
>>> track of repair status as it progresses. So, you must first query the
>>> parent_repair_history table to check if a repair started and finish, as
>>> well as its duration, and inspect the repair_history table to troubleshoot
>>> more specific details of a given repair session.
>>>
>>> Answering your questions below:
>>>
>>> > Is every invocation of nodetool repair execution will be recorded as
>>> one entry in parent_repair_history CF regardless if it is across DC, local
>>> node repair, or other options ?
>>>
>>> Actually two entries, one for start and one for finish.
>>>
>>> > A repair job is done only if "finished" column contains value? and a
>>> repair job is successfully done only if there is no value in exce
>>> ption_messages or exception_stacktrace ?
>>>
>>> correct
>>>
>>> > what is the purpose of successful_ranges column? do i have to check
>>> they are all matched with requested_range to ensure a successful run?
>>>
>>> correct
>>>
>>> -
>>> > Ultimately, how to find out 

Re: Checking replication status

2016-02-29 Thread Jimmy Lin
hi Bryan,
I guess I want to find out if there is any way to tell when data will
become consistent again in both cases.

if the node being down shorter than the max_hint_window(say 2 hours out of
3 hrs max), is there anyway to check the log or JMX etc to see if the hint
queue size back to zero or lower range?


if node goes down longer than max_hint_window time (say 4 hrs hours > our
max 3 hrs), we run repair job. What is the correct nodetool repair job
syntax to use?
in particular what is the difference between -local vs -dc? they both seems
to indicate repairing nodes within a datacenter, but for across DC network
outage, we want to repair nodes across DCs right?

thanks



On Fri, Feb 26, 2016 at 3:38 PM, Bryan Cheng  wrote:

> Hi Jimmy,
>
> If you sustain a long downtime, repair is almost always the way to go.
>
> It seems like you're asking to what extent a cluster is able to
> recover/resync a downed peer.
>
> A peer will not attempt to reacquire all the data it has missed while
> being down. Recovery happens in a few ways:
>
> 1) Hints: Assuming that there are enough peers to satisfy your quorum
> requirements on write, the live peers will queue up these operations for up
> to max_hint_window_in_ms (from cassandra.yaml). These hints will be
> delivered once the peer recovers.
> 2) Read repair: There is a probability that read repair will happen,
> meaning that a query will trigger data consistency checks and updates _on
> the query being performed_.
> 3) Repair.
>
> If a machine goes down for longer than max_hint_window_in_ms, AFAIK you
> _will_ have missing data. If you cannot tolerate this situation, you need
> to take a look at your tunable consistency and/or trigger a repair.
>
> On Thu, Feb 25, 2016 at 7:26 PM, Jimmy Lin  wrote:
>
>> so far they are not long, just some config change and restart.
>> if it is a 2 hrs downtime due to whatever reason, a repair is better
>> option than trying to figure out if replication syn finish or not?
>>
>> On Thu, Feb 25, 2016 at 1:09 PM, daemeon reiydelle 
>> wrote:
>>
>>> Hmm. What are your processes when a node comes back after "a long
>>> offline"? Long enough to take the node offline and do a repair? Run the
>>> risk of serving stale data? Parallel repairs? ???
>>>
>>> So, what sort of time frames are "a long time"?
>>>
>>>
>>> *...*
>>>
>>>
>>>
>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>
>>> On Thu, Feb 25, 2016 at 11:36 AM, Jimmy Lin  wrote:
>>>
 hi all,

 what are the better ways to check replication overall status of cassandra 
 cluster?

  within a single DC, unless a node is down for long time, most of the time 
 i feel it is pretty much non-issue and things are replicated pretty fast. 
 But when a node come back from a long offline, is there a way to check 
 that the node has finished its data sync with other nodes  ?

  Now across DC, we have frequent VPN outage (sometime short sometims long) 
 between DCs, i also like to know if there is a way to find how the 
 replication progress between DC catching up under this condtion?

  Also, if i understand correctly, the only gaurantee way to make sure data 
 are synced is to run a complete repair job,
 is that correct? I am trying to see if there is a way to "force a quick 
 replication sync" between DCs after vpn outage.
 Or maybe this is unnecessary, as Cassandra will catch up as fast as it 
 can, there is nothing else we/(system admin) can do to make it faster or 
 better?



 Sent from my iPhone

>>>
>>>
>>
>


Re: Too many sstables with DateTieredCompactionStrategy

2016-02-29 Thread Noorul Islam Kamal Malmiyoda
Hello Marcus,

I altered the table to set timestamp_resolution to 'MICROSECONDS'. I
waited for sometime, but the sstable count did not come down. Do you
think I should specific command to reduce the count of sstables after
setting this?

Thanks and Regards
Noorul


On Mon, Feb 29, 2016 at 7:22 PM, Marcus Eriksson  wrote:
> why do you have 'timestamp_resolution': 'MILLISECONDS'? It should be left as
> default (MICROSECONDS) unless you do "USING TIMESTAMP
> "-inserts, see
> https://issues.apache.org/jira/browse/CASSANDRA-11041
>
> On Mon, Feb 29, 2016 at 2:36 PM, Noorul Islam K M  wrote:
>>
>>
>> Hi all,
>>
>> We are using below compaction settings for a table
>>
>> compaction = {'timestamp_resolution': 'MILLISECONDS',
>> 'max_sstable_age_days': '365', 'base_time_seconds': '60', 'class':
>> 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy'}
>>
>> But it is creating too many sstables. Currently number of sstables
>> is 4. We have been injecting data for the last three days.
>>
>> We have set the compactionthroughput to 128 MB/s
>>
>> $ nodetool getcompactionthroughput
>>
>> Current compaction throughput: 128 MB/s
>>
>> But this is not helping.
>>
>> How can we control the number of sstables in this case?
>>
>> Thanks and Regards
>> Noorul
>
>


Re: Practical limit on number of column families

2016-02-29 Thread Robert Wille
Yes, there is memory overhead for each column family, effectively limiting the 
number of column families. The general wisdom is that you should limit yourself 
to a few hundred.

Robert

On Feb 29, 2016, at 10:30 AM, Fernando Jimenez 
> 
wrote:

Hi all

I have a use case for Cassandra that would require creating a large number of 
column families. I have found references to early versions of Cassandra where 
each column family would require a fixed amount of memory on all nodes, 
effectively imposing an upper limit on the total number of CFs. I have also 
seen rumblings that this may have been fixed in later versions.

To put the question to rest, I have setup a DSE sandbox and created some code 
to generate column families populated with 3,000 entries each.

Unfortunately I have now hit this issue: 
https://issues.apache.org/jira/browse/CASSANDRA-9291

So I will have to retest against Cassandra 3.0 instead

However, I would like to understand the limitations regarding creation of 
column families.

* Is there a practical upper limit?
* is this a fixed limit, or does it scale as more nodes are added into the 
cluster?
* Is there a difference between one keyspace with thousands of column families, 
vs thousands of keyspaces with only a few column families each?

I haven’t found any hard evidence/documentation to help me here, but if you can 
point me in the right direction, I will oblige and RTFM away.

Many thanks for your help!

Cheers
FJ





Practical limit on number of column families

2016-02-29 Thread Fernando Jimenez
Hi all

I have a use case for Cassandra that would require creating a large number of 
column families. I have found references to early versions of Cassandra where 
each column family would require a fixed amount of memory on all nodes, 
effectively imposing an upper limit on the total number of CFs. I have also 
seen rumblings that this may have been fixed in later versions.

To put the question to rest, I have setup a DSE sandbox and created some code 
to generate column families populated with 3,000 entries each.

Unfortunately I have now hit this issue: 
https://issues.apache.org/jira/browse/CASSANDRA-9291 


So I will have to retest against Cassandra 3.0 instead

However, I would like to understand the limitations regarding creation of 
column families. 

* Is there a practical upper limit? 
* is this a fixed limit, or does it scale as more nodes are added into 
the cluster? 
* Is there a difference between one keyspace with thousands of column 
families, vs thousands of keyspaces with only a few column families each?

I haven’t found any hard evidence/documentation to help me here, but if you can 
point me in the right direction, I will oblige and RTFM away.

Many thanks for your help!

Cheers
FJ




Re: Too many sstables with DateTieredCompactionStrategy

2016-02-29 Thread Noorul Islam K M
Alain RODRIGUEZ  writes:

> Might be due to this:
>
> Fixed in 2.1.12 (Assuming you are using C*2.1):
> https://issues.apache.org/jira/browse/CASSANDRA-10422
>
> Some question to have more context:
>
>
>1. What C* version are you using?

We are using DSE 4.8.3, hence Apache Cassandra 2.1.12.1046

>2. Do you use vnodes?

Yes

>3. How many vnodes per node?

32

>4. How many nodes / DC do you have?

We have 3 DCs

cassandra, spark and solr.

The keyspace has RF 3 in cassandra and solr DC.


>5. How do you run repairs (tool & command)?

We enabled auto repair feature provided by OpsCenter.

Thanks and Regards
Noorul


> C*heers,
>
> ---
> Alain Rodriguez - al...@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> 2016-02-29 15:50 GMT+01:00 Noorul Islam Kamal Malmiyoda :
>
>> Yes, we have enabled it on OpsCenter. Is that the reason?
>> On Feb 29, 2016 8:07 PM, "Dominik Keil" 
>> wrote:
>>
>>> Are you using incremental repais?
>>>
>>> Am 29.02.2016 um 14:36 schrieb Noorul Islam K M:
>>>
>>> Hi all,
>>>
>>> We are using below compaction settings for a table
>>>
>>> compaction = {'timestamp_resolution': 'MILLISECONDS',
>>> 'max_sstable_age_days': '365', 'base_time_seconds': '60', 'class':
>>> 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy'}
>>>
>>> But it is creating too many sstables. Currently number of sstables
>>> is 4. We have been injecting data for the last three days.
>>>
>>> We have set the compactionthroughput to 128 MB/s
>>>
>>> $ nodetool getcompactionthroughput
>>>
>>> Current compaction throughput: 128 MB/s
>>>
>>> But this is not helping.
>>>
>>> How can we control the number of sstables in this case?
>>>
>>> Thanks and Regards
>>> Noorul
>>>
>>>
>>> --
>>> *Dominik Keil*
>>> Phone: + 49 (0) 621 150 207 31
>>> Mobile: + 49 (0) 151 626 602 14
>>>
>>> Movilizer GmbH
>>> Julius-Hatry-Strasse 1
>>> 68163 Mannheim
>>> Germany
>>>
>>> movilizer.com
>>>
>>> [image: Visit company website] 
>>> *Reinvent Your Mobile Enterprise*
>>>
>>> 
>>> 
>>>
>>> *Be the first to know:*
>>> Twitter  | LinkedIn
>>>  | Facebook
>>>  | stack overflow
>>> 
>>>
>>> Company's registered office: Mannheim HRB: 700323 / Country Court:
>>> Mannheim Managing Directors: Alberto Zamora, Jörg Bernauer, Oliver Lesche
>>> Please inform us immediately if this e-mail and/or any attachment was
>>> transmitted incompletely or was not intelligible.
>>>
>>> This e-mail and any attachment is for authorized use by the intended
>>> recipient(s) only. It may contain proprietary material, confidential
>>> information and/or be subject to legal privilege. It should not be
>>> copied, disclosed to, retained or used by any other party. If you are not
>>> an intended recipient then please promptly delete this e-mail and any
>>> attachment and all copies and inform the sender.
>>
>>


Re: Too many sstables with DateTieredCompactionStrategy

2016-02-29 Thread Alain RODRIGUEZ
Might be due to this:

Fixed in 2.1.12 (Assuming you are using C*2.1):
https://issues.apache.org/jira/browse/CASSANDRA-10422

Some question to have more context:


   1. What C* version are you using?
   2. Do you use vnodes?
   3. How many vnodes per node?
   4. How many nodes / DC do you have?
   5. How do you run repairs (tool & command)?


C*heers,

---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-02-29 15:50 GMT+01:00 Noorul Islam Kamal Malmiyoda :

> Yes, we have enabled it on OpsCenter. Is that the reason?
> On Feb 29, 2016 8:07 PM, "Dominik Keil" 
> wrote:
>
>> Are you using incremental repais?
>>
>> Am 29.02.2016 um 14:36 schrieb Noorul Islam K M:
>>
>> Hi all,
>>
>> We are using below compaction settings for a table
>>
>> compaction = {'timestamp_resolution': 'MILLISECONDS',
>> 'max_sstable_age_days': '365', 'base_time_seconds': '60', 'class':
>> 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy'}
>>
>> But it is creating too many sstables. Currently number of sstables
>> is 4. We have been injecting data for the last three days.
>>
>> We have set the compactionthroughput to 128 MB/s
>>
>> $ nodetool getcompactionthroughput
>>
>> Current compaction throughput: 128 MB/s
>>
>> But this is not helping.
>>
>> How can we control the number of sstables in this case?
>>
>> Thanks and Regards
>> Noorul
>>
>>
>> --
>> *Dominik Keil*
>> Phone: + 49 (0) 621 150 207 31
>> Mobile: + 49 (0) 151 626 602 14
>>
>> Movilizer GmbH
>> Julius-Hatry-Strasse 1
>> 68163 Mannheim
>> Germany
>>
>> movilizer.com
>>
>> [image: Visit company website] 
>> *Reinvent Your Mobile Enterprise*
>>
>> 
>> 
>>
>> *Be the first to know:*
>> Twitter  | LinkedIn
>>  | Facebook
>>  | stack overflow
>> 
>>
>> Company's registered office: Mannheim HRB: 700323 / Country Court:
>> Mannheim Managing Directors: Alberto Zamora, Jörg Bernauer, Oliver Lesche
>> Please inform us immediately if this e-mail and/or any attachment was
>> transmitted incompletely or was not intelligible.
>>
>> This e-mail and any attachment is for authorized use by the intended
>> recipient(s) only. It may contain proprietary material, confidential
>> information and/or be subject to legal privilege. It should not be
>> copied, disclosed to, retained or used by any other party. If you are not
>> an intended recipient then please promptly delete this e-mail and any
>> attachment and all copies and inform the sender.
>
>


Re: Too many sstables with DateTieredCompactionStrategy

2016-02-29 Thread Noorul Islam Kamal Malmiyoda
Yes, we have enabled it on OpsCenter. Is that the reason?
On Feb 29, 2016 8:07 PM, "Dominik Keil"  wrote:

> Are you using incremental repais?
>
> Am 29.02.2016 um 14:36 schrieb Noorul Islam K M:
>
>
> Hi all,
>
> We are using below compaction settings for a table
>
> compaction = {'timestamp_resolution': 'MILLISECONDS',
> 'max_sstable_age_days': '365', 'base_time_seconds': '60', 'class':
> 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy'}
>
> But it is creating too many sstables. Currently number of sstables
> is 4. We have been injecting data for the last three days.
>
> We have set the compactionthroughput to 128 MB/s
>
> $ nodetool getcompactionthroughput
>
> Current compaction throughput: 128 MB/s
>
> But this is not helping.
>
> How can we control the number of sstables in this case?
>
> Thanks and Regards
> Noorul
>
>
> --
> *Dominik Keil*
> Phone: + 49 (0) 621 150 207 31
> Mobile: + 49 (0) 151 626 602 14
>
> Movilizer GmbH
> Julius-Hatry-Strasse 1
> 68163 Mannheim
> Germany
>
> movilizer.com
>
> [image: Visit company website] 
> *Reinvent Your Mobile Enterprise*
>
> 
> 
>
> *Be the first to know:*
> Twitter  | LinkedIn
>  | Facebook
>  | stack overflow
> 
>
> Company's registered office: Mannheim HRB: 700323 / Country Court:
> Mannheim Managing Directors: Alberto Zamora, Jörg Bernauer, Oliver Lesche
> Please inform us immediately if this e-mail and/or any attachment was
> transmitted incompletely or was not intelligible.
>
> This e-mail and any attachment is for authorized use by the intended
> recipient(s) only. It may contain proprietary material, confidential
> information and/or be subject to legal privilege. It should not be
> copied, disclosed to, retained or used by any other party. If you are not
> an intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender.


Re: Too many sstables with DateTieredCompactionStrategy

2016-02-29 Thread Dominik Keil
Are you using incremental repais?

Am 29.02.2016 um 14:36 schrieb Noorul Islam K M:
> Hi all,
>
> We are using below compaction settings for a table
>
> compaction = {'timestamp_resolution': 'MILLISECONDS',
> 'max_sstable_age_days': '365', 'base_time_seconds': '60', 'class':
> 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy'}
>
> But it is creating too many sstables. Currently number of sstables
> is 4. We have been injecting data for the last three days.
>
> We have set the compactionthroughput to 128 MB/s
>
> $ nodetool getcompactionthroughput
>
> Current compaction throughput: 128 MB/s
>
> But this is not helping. 
>
> How can we control the number of sstables in this case?
>
> Thanks and Regards
> Noorul

-- 
*Dominik Keil*
Phone: + 49 (0) 621 150 207 31
Mobile: + 49 (0) 151 626 602 14

Movilizer GmbH
Julius-Hatry-Strasse 1
68163 Mannheim
Germany

-- 
movilizer.com

[image: Visit company website] 
*Reinvent Your Mobile Enterprise*




*Be the first to know:*
Twitter  | LinkedIn 
 | Facebook 
 | stack overflow 


Company's registered office: Mannheim HRB: 700323 / Country Court: Mannheim 
Managing Directors: Alberto Zamora, Jörg Bernauer, Oliver Lesche Please 
inform us immediately if this e-mail and/or any attachment was transmitted 
incompletely or was not intelligible.

This e-mail and any attachment is for authorized use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be 
copied, disclosed to, retained or used by any other party. If you are not 
an intended recipient then please promptly delete this e-mail and any 
attachment and all copies and inform the sender.


Re: Too many sstables with DateTieredCompactionStrategy

2016-02-29 Thread Marcus Eriksson
why do you have 'timestamp_resolution': 'MILLISECONDS'? It should be left
as default (MICROSECONDS) unless you do "USING TIMESTAMP
"-inserts, see
https://issues.apache.org/jira/browse/CASSANDRA-11041

On Mon, Feb 29, 2016 at 2:36 PM, Noorul Islam K M  wrote:

>
> Hi all,
>
> We are using below compaction settings for a table
>
> compaction = {'timestamp_resolution': 'MILLISECONDS',
> 'max_sstable_age_days': '365', 'base_time_seconds': '60', 'class':
> 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy'}
>
> But it is creating too many sstables. Currently number of sstables
> is 4. We have been injecting data for the last three days.
>
> We have set the compactionthroughput to 128 MB/s
>
> $ nodetool getcompactionthroughput
>
> Current compaction throughput: 128 MB/s
>
> But this is not helping.
>
> How can we control the number of sstables in this case?
>
> Thanks and Regards
> Noorul
>


Too many sstables with DateTieredCompactionStrategy

2016-02-29 Thread Noorul Islam K M

Hi all,

We are using below compaction settings for a table

compaction = {'timestamp_resolution': 'MILLISECONDS',
'max_sstable_age_days': '365', 'base_time_seconds': '60', 'class':
'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy'}

But it is creating too many sstables. Currently number of sstables
is 4. We have been injecting data for the last three days.

We have set the compactionthroughput to 128 MB/s

$ nodetool getcompactionthroughput

Current compaction throughput: 128 MB/s

But this is not helping. 

How can we control the number of sstables in this case?

Thanks and Regards
Noorul


Re: Replacing disks

2016-02-29 Thread Michał Łowicki
On Mon, Feb 29, 2016 at 8:52 AM, Alain RODRIGUEZ  wrote:

> I wrote that a few days ago:
> http://thelastpickle.com/blog/2016/02/25/removing-a-disk-mapping-from-cassandra.html
>
> I believe this might help you.
>

Yes, looks promising. Thanks!


> C*heers,
> ---
>
> Alain Rodriguez - al...@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http :// 
> www.thelastpickle.com
> Le 28 févr. 2016 15:17, "Clint Martin" <
> clintlmar...@coolfiretechnologies.com> a écrit :
>
>> Code wise, I am not completely familiar with what accomplishes the
>> behavior.  But my understanding and experience is that Cass 2.1 picks the
>> drive with the most free space when picking a destination for a compaction
>> operation.
>> (This is an overly simplistic description. Reality is always more
>> nuanced.  datastax had a blog post that describes this better as well as
>> limitations to the algorithm in 2.1 which are addressed in the 3.x releases
>> )
>>
>> Clint
>> On Feb 28, 2016 10:11 AM, "Michał Łowicki"  wrote:
>>
>>>
>>>
>>> On Sun, Feb 28, 2016 at 4:00 PM, Clint Martin <
>>> clintlmar...@coolfiretechnologies.com> wrote:
>>>
 Your plan for replacing your 200gb drive sounds good to me. Since you
 are running jbod, I wouldn't worry about manually redistributing data from
 your other disk to the new one. Cassandra will do that for you as it
 performs compaction.

>>>
>>> Is this done by pickWriteableDirectory
>>> 
>>> ?
>>>
 While you're doing the drive change, you need to complete the swap and
 restart of the node before the hinted handoff window expires on the other
 nodes. If you do not complete in time, you'll want to perform a repair on
 the node.

>>>
>>> Yes. Thanks!
>>>
>>>


 Clint
 On Feb 28, 2016 9:33 AM, "Michał Łowicki"  wrote:

> Hi,
>
> I've two disks on single box (500GB + 200GB). data_file_directories
> in cassandra.yaml has two entries. I would like to replace 200GB with 
> 500GB
> as it's running out of space and to align it with others we've in the
> cluster. The plan is to stop C*, attach new disk, move data from 200GB to
> new one and mount it at the same point in the hierarchy. When done start 
> C*.
>
> Additionally I would like to move some data from the old 500GB to the
> new one to distribute used disk space equally. Probably all related files
> for single SSTable should be moved i.e.
>
> foo-bar-ka-1630184-CompressionInfo.db
>
> foo-bar-ka-1630184-Data.db
>
> foo-bar-ka-1630184-Digest.sha1
>
> foo-bar-ka-1630184-Filter.db
>
> foo-bar-ka-1630184-Index.db
>
> foo-bar-ka-1630184-Statistics.db
>
> foo-bar-ka-1630184-Summary.db
>
> foo-bar-ka-1630184-TOC.txt
>
> Is this something which should work or you see some obstacles? (C*
> 2.1.13).
> --
> BR,
> Michał Łowicki
>

>>>
>>>
>>> --
>>> BR,
>>> Michał Łowicki
>>>
>>


-- 
BR,
Michał Łowicki