CVE-2023-30601: Apache Cassandra: Privilege escalation when enabling FQL/Audit logs
Severity: important Affected versions: - Apache Cassandra 4.0.0 through 4.0.9 - Apache Cassandra 4.1.0 through 4.1.1 Description: Privilege escalation when enabling FQL/Audit logs allows user with JMX access to run arbitrary commands as the user running Apache Cassandra This issue affects Apache Cassandra: from 4.0.0 through 4.0.9, from 4.1.0 through 4.1.1. WORKAROUND The vulnerability requires nodetool/JMX access to be exploitable, disable access for any non-trusted users. MITIGATION Upgrade to 4.0.10 or 4.1.2 and leave the new FQL/Auditlog configuration property allow_nodetool_archive_command as false. This issue is being tracked as CASSANDRA-18550 Credit: Gal Elbaz at Oligo (finder) References: https://cassandra.apache.org/ https://www.cve.org/CVERecord?id=CVE-2023-30601 https://issues.apache.org/jira/browse/CASSANDRA-18550
Re: Failed service startup
This looks like https://issues.apache.org/jira/browse/CASSANDRA-17273 iirc you can merge the two files - making sure all ADD and REMOVE records are in both files, I think you would need to add `ADD:[/mnt/data01/cassandra/data/hades/prod_md5_sha1-bb5bdca002b111edb9761fc3bb7c847c/nb-67417-big-,0,8][3940068469]` to the data02 transaction log file Make sure you back up all involved sstables before trying this /Marcus On Mon, Dec 12, 2022 at 02:40:25PM +, Marc Hoppins wrote: > Hi, all, > > We had a failed HDD on one node. The node was shut down pending repair. > There are now 4 other nodes with Cassandra not running and unable to startup > due to the following kinds of error. Is this kind of thing due to the > original stopped node? > > ERROR [main] 2022-12-12 14:58:10,838 LogReplicaSet.java:145 - Mismatched line > in file > nb_txn_anticompactionafterrepair_5865e530-7a18-11ed-950f-954f6819a607.log: > got > 'ADD:[/mnt/data01/cassandra/data/hades/prod_md5_sha1-bb5bdca002b111edb9761fc3bb7c847c/nb-67417-big-,0,8][3940068469]' > expected > 'ADD:[/mnt/data01/cassandra/data/hades/prod_md5_sha1-bb5bdca002b111edb9761fc3bb7c847c/nb-67418-big-,0,8][2798461787]', > giving up > ERROR [main] 2022-12-12 14:58:10,838 LogFile.java:161 - Failed to read > records for transaction log > [nb_txn_anticompactionafterrepair_5865e530-7a18-11ed-950f-954f6819a607.log in > /mnt/data02/cassandra/data/hades/prod_md5_sha1-bb5bdca002b111edb9761fc3bb7c847c, > > /mnt/data01/cassandra/data/hades/prod_md5_sha1-bb5bdca002b111edb9761fc3bb7c847c] > ERROR [main] 2022-12-12 14:58:10,840 LogTransaction.java:551 - Unexpected > disk state: failed to read transaction log > [nb_txn_anticompactionafterrepair_5865e530-7a18-11ed-950f-954f6819a607.log in > /mnt/data02/cassandra/data/hades/prod_md5_sha1-bb5bdca002b111edb9761fc3bb7c847c, > > /mnt/data01/cassandra/data/hades/prod_md5_sha1-bb5bdca002b111edb9761fc3bb7c847c] > Files and contents follow: > /mnt/data02/cassandra/data/hades/prod_md5_sha1-bb5bdca002b111edb9761fc3bb7c847c/nb_txn_anticompactionafterrepair_5865e530-7a18-11ed-950f-954f6819a607.log > > ADD:[/mnt/data01/cassandra/data/hades/prod_md5_sha1-bb5bdca002b111edb9761fc3bb7c847c/nb-67416-big-,0,8][1963077611] > > ADD:[/mnt/data01/cassandra/data/hades/prod_md5_sha1-bb5bdca002b111edb9761fc3bb7c847c/nb-67418-big-,0,8][2798461787] > > REMOVE:[/mnt/data02/cassandra/data/hades/prod_md5_sha1-bb5bdca002b111edb9761fc3bb7c847c/nb-67405-big-,1665045804823,8][1428695358] > > REMOVE:[/mnt/data02/cassandra/data/hades/prod_md5_sha1-bb5bdca002b111edb9761fc3bb7c847c/nb-67402-big-,1665050002894,8][2407633150] > COMMIT:[,0,0][2613697770] > /mnt/data01/cassandra/data/hades/prod_md5_sha1-bb5bdca002b111edb9761fc3bb7c847c/nb_txn_anticompactionafterrepair_5865e530-7a18-11ed-950f-954f6819a607.log > > ADD:[/mnt/data01/cassandra/data/hades/prod_md5_sha1-bb5bdca002b111edb9761fc3bb7c847c/nb-67416-big-,0,8][1963077611] > > ADD:[/mnt/data01/cassandra/data/hades/prod_md5_sha1-bb5bdca002b111edb9761fc3bb7c847c/nb-67417-big-,0,8][3940068469] > ***Does not match > > in first replica file > > ADD:[/mnt/data01/cassandra/data/hades/prod_md5_sha1-bb5bdca002b111edb9761fc3bb7c847c/nb-67418-big-,0,8][2798461787] > > REMOVE:[/mnt/data02/cassandra/data/hades/prod_md5_sha1-bb5bdca002b111edb9761fc3bb7c847c/nb-67405-big-,1665045804823,8][1428695358] > > REMOVE:[/mnt/data02/cassandra/data/hades/prod_md5_sha1-bb5bdca002b111edb9761fc3bb7c847c/nb-67402-big-,1665050002894,8][2407633150] > COMMIT:[,0,0][2613697770] > > ERROR [main] 2022-12-12 14:58:10,841 CassandraDaemon.java:911 - Cannot remove > temporary or obsoleted files for hades.prod_md5_sha1 due to a problem with > transaction log files. Please check records with problems in the log messages > above and fix them. Refer to the 3.0 upgrading instructions in NEWS.txt for a > description of transaction log files. > > Sstableutil only returned > > ERROR 15:35:52,217 Mismatched line in file > nb_txn_anticompactionafterrepair_5865e530-7a18-11ed-950f-954f6819a607.log: > got > 'ADD:[/mnt/data01/cassandra/data/hades/prod_md5_sha1-bb5bdca002b111edb9761fc3bb7c847c/nb-67417-big-,0,8][3940068469]' > expected > 'ADD:[/mnt/data01/cassandra/data/hades/prod_md5_sha1-bb5bdca002b111edb9761fc3bb7c847c/nb-67418-big-,0,8][2798461787]', > giving up > ERROR 15:35:52,219 Failed to read records for transaction log > [nb_txn_anticompactionafterrepair_5865e530-7a18-11ed-950f-954f6819a607.log in > /mnt/data02/cassandra/data/hades/prod_md5_sha1-bb5bdca002b111edb9761fc3bb7c847c, > > /mnt/data01/cassandra/data/hades/prod_md5_sha1-bb5bdca002b111edb9761fc3bb7c847c] > ERROR 15:35:52,220 Unexpected disk state: failed to read transaction log > [nb_txn_anticompactionafterrepair_5865e530-7a18-11ed-950f-954f6819a607.log in > /mnt/data02/cassandra/data/hades/prod_md5_sha1-bb5bdca002b111edb9761fc3bb7c847c, > >
CVE-2021-44521: Apache Cassandra: Remote code execution for scripted UDFs
Severity: high Description: When running Apache Cassandra with the following configuration: enable_user_defined_functions: true enable_scripted_user_defined_functions: true enable_user_defined_functions_threads: false it is possible for an attacker to execute arbitrary code on the host. The attacker would need to have enough permissions to create user defined functions in the cluster to be able to exploit this. Note that this configuration is documented as unsafe, and will continue to be considered unsafe after this CVE. This issue is being tracked as CASSANDRA-17352 Mitigation: Set `enable_user_defined_functions_threads: true` (this is default) or 3.0 users should upgrade to 3.0.26 3.11 users should upgrade to 3.11.12 4.0 users should upgrade to 4.0.2 Credit: This issue was discovered by Omer Kaspi of the JFrog Security vulnerability research team.
Re: What is the cons of changing LCS fanout option to 100 or even bigger?
problem would be that for every file you flush, you would recompact all of L1 - files are flushed to L0, then compacted together with all overlapping files in L1. On Tue, Sep 18, 2018 at 4:53 AM 健 戴 wrote: > Hi, > > I have one table having 2T data saved in c* each node. > And if using LCS, the data will have 5 level: > > >- L1: 160M * 10 = 1.6G >- L2: 1.6G * 10 = 16G >- L3: 16G * 10 = 160G >- L4: 160G * 10 = 1.6T >- L5: 1.6T * 10 = 16T > > When I looking into the source code, I found an option: fanout_size. > > The default value is 10. What about change this value to 100? Then the > level will reduce to 3: > >- L1: 160M * 100 = 16G >- L2: 16G * 100 = 1.6T >- L3: 1.6T * 100 = 160T > > Or even could I set this to 1? And all files are in a same level. > Should it be better then? > What is the cons of the bigger value of this option? > > Thanks for your help. > > > Jian >
Re: Fresh SSTable files (due to repair?) in a static table (was Re: Drop TTLd rows: upgradesstables -a or scrub?)
It could also be https://issues.apache.org/jira/browse/CASSANDRA-2503 On Mon, Sep 17, 2018 at 4:04 PM Jeff Jirsa wrote: > > > On Sep 17, 2018, at 2:34 AM, Oleksandr Shulgin < > oleksandr.shul...@zalando.de> wrote: > > On Tue, Sep 11, 2018 at 8:10 PM Oleksandr Shulgin < > oleksandr.shul...@zalando.de> wrote: > >> On Tue, 11 Sep 2018, 19:26 Jeff Jirsa, wrote: >> >>> Repair or read-repair >>> >> >> Could you be more specific please? >> >> Why any data would be streamed in if there is no (as far as I can see) >> possibilities for the nodes to have inconsistency? >> > > Again, given that the tables are not updated anymore from the application > and we have repaired them successfully multiple times already, how can it > be that any inconsistency would be found by read-repair or normal repair? > > We have seen this on a number of nodes, including SSTables written at the > time there was guaranteed no repair running. > > > Not obvious to me where the sstable is coming from - you’d have to look in > the logs. If it’s read repair, it’ll be created during a memtable flush. If > it’s nodetool repair, it’ll be streamed in. It could also be compaction > (especially tombstone compaction), in which case it’ll be in the compaction > logs and it’ll have an sstable ancestor in the metadata. > > >
Re: Index summary redistribution seems to block all compactions
Anything in the logs? It *could* be https://issues.apache.org/jira/browse/CASSANDRA-13873 On Tue, Oct 24, 2017 at 11:18 PM, Sotirios Delimanolis < sotodel...@yahoo.com.invalid> wrote: > On a Cassandra 2.2.11 cluster, I noticed estimated compactions > accumulating on one node. nodetool compactionstats showed the following: > > compaction typekeyspace table completed > totalunit progress > Compaction ks1some_table 204.68 MB > 204.98 MB bytes 99.86% >Index summary redistribution*null* *null* 457.72 > KB 950 MB bytes *0.05%* > Compaction ks1some_table 461.61 MB > 461.95 MB bytes 99.93% >Tombstone Compaction ks1some_table 618.34 MB > 618.47 MB bytes 99.98% > Compaction ks1some_table 378.37 MB > 380 MB bytes 99.57% >Tombstone Compaction ks1some_table 326.51 MB > 327.63 MB bytes 99.66% >Tombstone Compaction ks2 other_table29.38 MB > 29.38 MB bytes100.00% >Tombstone Compaction ks1some_table503.4 MB > 507.28 MB bytes 99.24% > Compaction ks1some_table 353.44 MB > 353.47 MB bytes 99.99% > > > They had been like this for a while (all different tables). A thread dump > showed all 8 CompactionExecutor threads looking like > > "CompactionExecutor:6" #84 daemon prio=1 os_prio=4 tid=0x7f5771172000 > nid=0x7646 waiting on condition [0x7f578847b000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0005fe5656e8> (a > com.google.common.util.concurrent.AbstractFuture$Sync) > at java.util.concurrent.locks.LockSupport.park(LockSupport. > java:175) > at java.util.concurrent.locks.AbstractQueuedSynchronizer. > parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at java.util.concurrent.locks.AbstractQueuedSynchronizer. > doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) > at java.util.concurrent.locks.AbstractQueuedSynchronizer. > acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) > at com.google.common.util.concurrent.AbstractFuture$ > Sync.get(AbstractFuture.java:285) > at com.google.common.util.concurrent.AbstractFuture.get( > AbstractFuture.java:116) > at org.apache.cassandra.utils.FBUtilities.waitOnFuture( > FBUtilities.java:390) > at org.apache.cassandra.db.SystemKeyspace.forceBlockingFlush( > SystemKeyspace.java:593) > at org.apache.cassandra.db.SystemKeyspace.finishCompaction( > SystemKeyspace.java:368) > at org.apache.cassandra.db.compaction.CompactionTask. > runMayThrow(CompactionTask.java:205) > at org.apache.cassandra.utils.WrappedRunnable.run( > WrappedRunnable.java:28) > at org.apache.cassandra.db.compaction.CompactionTask. > executeInternal(CompactionTask.java:74) > at org.apache.cassandra.db.compaction.AbstractCompactionTask. > execute(AbstractCompactionTask.java:80) > at org.apache.cassandra.db.compaction.CompactionManager$ > BackgroundCompactionCandidate.run(CompactionManager.java:257) > at java.util.concurrent.Executors$RunnableAdapter. > call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > A MemtablePostFlush thread was awaiting some flush count down latch > > "MemtablePostFlush:1" #30 daemon prio=5 os_prio=0 tid=0x7f57705dac00 > nid=0x75bf waiting on condition [0x7f578a8fb000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x000573da6c90> (a > java.util.concurrent.CountDownLatch$Sync) > at java.util.concurrent.locks.LockSupport.park(LockSupport. > java:175) > at java.util.concurrent.locks.AbstractQueuedSynchronizer. > parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at java.util.concurrent.locks.AbstractQueuedSynchronizer. > doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) > at java.util.concurrent.locks.AbstractQueuedSynchronizer. > acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) > at java.util.concurrent.CountDownLatch.await( > CountDownLatch.java:231) > at org.apache.cassandra.db.ColumnFamilyStore$PostFlush. > call(ColumnFamilyStore.java:1073) > at org.apache.cassandra.db.ColumnFamilyStore$PostFlush. > call(ColumnFamilyStore.java:1026) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) >
Re: Restoring a table cassandra - compactions
This is done to avoid overlap in levels > 0 There is this though: https://issues.apache.org/jira/browse/CASSANDRA-13425 If you are restoring an entire node, starting with an empty data directory, you should probably stop cassandra, copy the snapshot in, and restart, that will keep the levels On Thu, Jun 1, 2017 at 4:25 PM, Jean Carlowrote: > Hello. > > During the restore of a table using its snapshot and nodetool refresh, I > could see that cassandra starts to make a lot of compactions (depending on > the size of the data). > > I wanted to know why and I found this in the code of cassandra 2.1.14. > > for CASSANDRA-4872 > > +// force foreign sstables to level 0 > +try > +{ > +if (new File(descriptor.filenameFor( > Component.STATS)).exists()) > +{ > +SSTableMetadata oldMetadata = > SSTableMetadata.serializer.deserialize(descriptor); > +LeveledManifest.mutateLevel(oldMetadata, descriptor, > descriptor.filenameFor(Component.STATS), 0); > +} > +} > +catch (IOException e) > > > This is very interesting and I wanted to know if this was coded taking > into account only the case of a migration from STCS to LCS or if for the > case LCS to LCS this is not pertinent > > In my case, I use nodetool refresh not only to restore a table but also to > make an exact copy of any table LCS. So I think the levels do not need to > change. > > @Marcus Can you be so kind to clarify this for me please ? > > Thenk you very much in advance > > Best regards > > Jean Carlo > > "The best way to predict the future is to invent it" Alan Kay >
Re: dtests jolokia fails to attach
It is this: "-XX:+PerfDisableSharedMem" - in your dtest you need to do "remove_perf_disable_shared_mem(node1)" before starting the node /Marcus On Thu, Oct 6, 2016 at 8:30 AM, Benjamin Rothwrote: > Maybe additional information, this is the CS command line for ccm node1: > > br 20376 3.2 8.6 2331136 708308 pts/5 Sl 06:10 0:30 java > -Xloggc:/home/br/.ccm/test/node1/logs/gc.log -ea -XX:+UseThreadPriorities > -XX:ThreadPriorityPolicy=42 -XX:+HeapDumpOnOutOfMemoryError -Xss256k > -XX:StringTableSize=103 -XX:+AlwaysPreTouch -XX:-UseBiasedLocking > -XX:+UseTLAB -XX:+ResizeTLAB -XX:+UseNUMA -XX:+PerfDisableSharedMem > -Djava.net.preferIPv4Stack=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC > -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 > -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 > -XX:+UseCMSInitiatingOccupancyOnly > -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled > -XX:+CMSEdenChunksRecordAlways -XX:+CMSClassUnloadingEnabled > -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC > -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime > -XX:+PrintPromotionFailure -XX:+UseGCLogFileRotation > -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M -Xms500M -Xmx500M -Xmn50M > -XX:+UseCondCardMark > -XX:CompileCommandFile=/home/br/.ccm/test/node1/conf/hotspot_compiler > -javaagent:/home/br/repos/cassandra/lib/jamm-0.3.0.jar > -Dcassandra.jmx.local.port=7100 > -Dcom.sun.management.jmxremote.authenticate=false > -Dcom.sun.management.jmxremote.password.file=/etc/cassandra/jmxremote.password > -Djava.library.path=/home/br/repos/cassandra/lib/sigar-bin > -Dcassandra.migration_task_wait_in_seconds=6 -Dcassandra.libjemalloc=/usr/ > lib/x86_64-linux-gnu/libjemalloc.so.1 -Dlogback.configurationFile=logback.xml > -Dcassandra.logdir=/var/log/cassandra > -Dcassandra.storagedir=/home/br/repos/cassandra/data > -Dcassandra-pidfile=/home/br/.ccm/test/node1/cassandra.pid -cp > /home/br/.ccm/test/node1/conf:/home/br/repos/cassandra/ > build/classes/main:/home/br/repos/cassandra/build/classes/ > thrift:/home/br/repos/cassandra/lib/HdrHistogram-2.1.9.jar:/home/br/repos/ > cassandra/lib/ST4-4.0.8.jar:/home/br/repos/cassandra/lib/ > airline-0.6.jar:/home/br/repos/cassandra/lib/antlr- > runtime-3.5.2.jar:/home/br/repos/cassandra/lib/asm-5.0.4. > jar:/home/br/repos/cassandra/lib/caffeine-2.2.6.jar:/home/ > br/repos/cassandra/lib/cassandra-driver-core-3.0.1- > shaded.jar:/home/br/repos/cassandra/lib/commons-cli-1.1. > jar:/home/br/repos/cassandra/lib/commons-codec-1.2.jar:/ > home/br/repos/cassandra/lib/commons-lang3-3.1.jar:/home/ > br/repos/cassandra/lib/commons-math3-3.2.jar:/home/br/repos/cassandra/lib/ > compress-lzf-0.8.4.jar:/home/br/repos/cassandra/lib/ > concurrent-trees-2.4.0.jar:/home/br/repos/cassandra/lib/ > concurrentlinkedhashmap-lru-1.4.jar:/home/br/repos/ > cassandra/lib/disruptor-3.0.1.jar:/home/br/repos/cassandra/ > lib/ecj-4.4.2.jar:/home/br/repos/cassandra/lib/guava-18. > 0.jar:/home/br/repos/cassandra/lib/high-scale-lib- > 1.0.6.jar:/home/br/repos/cassandra/lib/hppc-0.5.4.jar:/ > home/br/repos/cassandra/lib/jackson-core-asl-1.9.2.jar:/ > home/br/repos/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/ > home/br/repos/cassandra/lib/jamm-0.3.0.jar:/home/br/repos/ > cassandra/lib/javax.inject.jar:/home/br/repos/cassandra/ > lib/jbcrypt-0.3m.jar:/home/br/repos/cassandra/lib/jcl-over- > slf4j-1.7.7.jar:/home/br/repos/cassandra/lib/jctools- > core-1.2.1.jar:/home/br/repos/cassandra/lib/jflex-1.6.0.jar: > /home/br/repos/cassandra/lib/jna-4.0.0.jar:/home/br/repos/ > cassandra/lib/joda-time-2.4.jar:/home/br/repos/cassandra/ > lib/json-simple-1.1.jar:/home/br/repos/cassandra/lib/ > libthrift-0.9.2.jar:/home/br/repos/cassandra/lib/log4j- > over-slf4j-1.7.7.jar:/home/br/repos/cassandra/lib/logback- > classic-1.1.3.jar:/home/br/repos/cassandra/lib/logback- > core-1.1.3.jar:/home/br/repos/cassandra/lib/lz4-1.3.0.jar:/ > home/br/repos/cassandra/lib/metrics-core-3.1.0.jar:/home/ > br/repos/cassandra/lib/metrics-jvm-3.1.0.jar:/home/br/repos/cassandra/lib/ > metrics-logback-3.1.0.jar:/home/br/repos/cassandra/lib/ > netty-all-4.0.39.Final.jar:/home/br/repos/cassandra/lib/ > ohc-core-0.4.4.jar:/home/br/repos/cassandra/lib/ohc-core- > j8-0.4.4.jar:/home/br/repos/cassandra/lib/primitive-1.0. > jar:/home/br/repos/cassandra/lib/reporter-config-base-3.0. > 0.jar:/home/br/repos/cassandra/lib/reporter-config3-3.0.0.jar:/home/br/ > repos/cassandra/lib/sigar-1.6.4.jar:/home/br/repos/ > cassandra/lib/slf4j-api-1.7.7.jar:/home/br/repos/cassandra/ > lib/snakeyaml-1.11.jar:/home/br/repos/cassandra/lib/snappy- > java-1.1.1.7.jar:/home/br/repos/cassandra/lib/snowball- > stemmer-1.3.0.581.1.jar:/home/br/repos/cassandra/lib/stream- > 2.5.2.jar:/home/br/repos/cassandra/lib/thrift-server-0. > 3.7.jar:/home/br/repos/cassandra/lib/jsr223/*/*.jar > -Dcassandra.join_ring=True -Dcassandra.logdir=/home/br/.ccm/test/node1/logs > -Dcassandra.boot_without_jna=true
Re: High Heap Memory usage during nodetool repair in Cassandra 3.0.3
it could also be CASSANDRA-11412 if you have many sstables and vnodes On Wed, Jun 22, 2016 at 2:50 PM, Bhuvan Rawalwrote: > Thanks for the info Paulo, Robert. I tried further testing with other > parameters and it was prevalent. We could be either 11739, 11206. But im > spektical about 11739 because repair works well in 3.5 and 11739 seems to > be fixed for 3.7/3.0.7. > > We may possibly resolve this by increasing heap size thereby reducing some > page cache bandwidth before upgrading to higher versions. > > On Mon, Jun 20, 2016 at 10:00 PM, Paulo Motta > wrote: > >> You could also be hitting CASSANDRA-11739, which was fixed on 3.0.7 and >> could potentially cause OOMs for long-running repairs. >> >> >> 2016-06-20 13:26 GMT-03:00 Robert Stupp : >> >>> One possibility might be CASSANDRA-11206 (Support large partitions on >>> the 3.0 sstable format), which reduces heap usage for other operations >>> (like repair, compactions) as well. >>> You can verify that by setting column_index_cache_size_in_kb in c.yaml >>> to a really high value like 1000 - if you see the same behaviour in 3.7 >>> with that setting, there’s not much you can do except upgrading to 3.7 as >>> that change went into 3.6 and not into 3.0.x. >>> >>> — >>> Robert Stupp >>> @snazy >>> >>> On 20 Jun 2016, at 18:13, Bhuvan Rawal wrote: >>> >>> Hi All, >>> >>> We are running Cassandra 3.0.3 on Production with Max Heap Size of 8GB. >>> There has been a consistent issue with nodetool repair for a while and >>> we have tried issuing it with multiple options --pr, --local as well, >>> sometimes node went down with Out of Memory error and at times nodes did >>> stopped connecting any connection, even jmx nodetool commands. >>> >>> On trying with same data on 3.7 Repair Ran successfully without >>> encountering any of the above mentioned issues. I then tried increasing >>> heap to 16GB on 3.0.3 and repair ran successfully. >>> >>> I then analyzed memory usage during nodetool repair for 3.0.3(16GB >>> heap) vs 3.7 (8GB Heap) and 3.0.3 occupied 11-14 GB at all times, >>> whereas 3.7 spiked between 1-4.5 GB while repair runs. As they ran on >>> same dataset and unrepaired data with full repair. >>> >>> We would like to know if it is a known bug that was fixed post 3.0.3 and >>> there could be a possible way by which we can run repair on 3.0.3 without >>> increasing heap size as for all other activities 8GB works for us. >>> >>> PFA the visualvm snapshots. >>> >>> >>> 3.0.3 VisualVM Snapshot, consistent heap usage of greater than 12 GB. >>> >>> >>> >>> 3.7 VisualVM Snapshot, 8GB Max Heap and max heap usage till about 5GB. >>> >>> Thanks & Regards, >>> Bhuvan Rawal >>> >>> >>> PS: In case if the snapshots are not visible, they can be viewed from >>> the following links: >>> 3.0.3: >>> https://s31.postimg.org/4e7ifsjaz/Screenshot_from_2016_06_20_21_06_09.png >>> 3.7: >>> https://s31.postimg.org/xak32s9m3/Screenshot_from_2016_06_20_21_05_57.png >>> >>> >>> >> >
Re: Effectiveness of Scrub Operation vs SSTable previously marked in blacklist
yeah that is most likely a bug, could you file a ticket? On Tue, Mar 22, 2016 at 4:36 AM, Michael Fong < michael.f...@ruckuswireless.com> wrote: > Hi, all, > > > > We recently encountered a scenario under Cassandra 2.0 deployment. > Cassandra detected a corrupted sstable, and when we attempt to scrub the > sstable (with all the associated sstables), the corrupted sstable was not > included in the sstable list. This continues until we restart Cassandra and > perform sstable again. > > > > After we traced the Cassandra source code, we are a bit confused with the > effectiveness of scrubbing and SStable being marked in blacklist in > Cassandra 2.0+ > > > > It seems from previous version (Cassandra 1.2), the scrub operation would > operate on a sstable regardless of it being previously marked. However, in > Cassandra 2.0, the function flows seems changed. > > > > Here is function flow that we traced in Cassandra 2.0 source code: > > > > From org.apache.cassandra.db.compaction.CompactionManager > > … > public void performScrub(ColumnFamilyStore cfStore, final boolean > skipCorrupted, final boolean checkData) throws InterruptedException, > ExecutionException > > { > > performAllSSTableOperation(cfStore, new AllSSTablesOperation() > > { > > … > > private void performAllSSTableOperation(final ColumnFamilyStore cfs, > final AllSSTablesOperation operation) throws InterruptedException, > ExecutionException > > { > > final Iterable sstables = cfs.markAllCompacting(); > > … > > org.apache.cassandra.db. ColumnFamilyStore > … > > public Iterable markAllCompacting() > > { > > Callablecallable = new > Callable () > > { > > public Iterable call() throws Exception > > { > > assert data.getCompacting().isEmpty() : data.getCompacting(); > > Iterable sstables = > Lists.newArrayList(*AbstractCompactionStrategy.filterSuspectSSTables(getSSTables())*); > > if (Iterables.isEmpty(sstables)) > > return null; > > … > > > > If it is true, would this flow – marking corrupted sstable in blacklist, > defeat the original purpose of scrub operation? Thanks in advanced! > > > > > > Sincerely, > > > > Michael Fong >
Re: DTCS Question
On Wed, Mar 16, 2016 at 6:49 PM, Anubhav Kalewrote: > I am using Cassandra 2.1.13 which has all the latest DTCS fixes (it does > STCS within the DTCS windows). It also introduced a field called > MAX_WINDOW_SIZE which defaults to one day. > > > > So in my data folders, I may see SS Tables that span beyond a day > (generated through old data through repairs or commit logs), but whenever I > see a message in logs “Compacted Foo” (meaning the SS Table under question > was definitely a result of compaction), the “Foo” SS Table should never > have data beyond a day. Is this understanding accurate ? > No - not until https://issues.apache.org/jira/browse/CASSANDRA-10496 (read for explanation) > > > If we have issues with repairs pulling in old data, should MAX_WINDOW_SIZE > instead be set to a larger value so that we don’t run the risk of too many > SS Tables lying around and never getting compacted ? > No, with CASSANDRA-10280 that old data will get compacted if needed (assuming you have default settings). If the remote node is correctly date tiered, the streamed sstable will also be correctly date tiered. Then that streamed sstable will be put in a time window and if there are enough sstables in that old window, we do a compaction. /Marcus
Re: Compaction Filter in Cassandra
We don't have anything like that, do you have a specific use case in mind? Could you create a JIRA ticket and we can discuss there? /Marcus On Sat, Mar 12, 2016 at 7:05 AM, Dikang Guwrote: > Hello there, > > RocksDB has the feature called "Compaction Filter" to allow application to > modify/delete a key-value during the background compaction. > https://github.com/facebook/rocksdb/blob/v4.1/include/rocksdb/options.h#L201-L226 > > I'm wondering is there a plan/value to add this into C* as well? Or is > there already a similar thing in C*? > > Thanks > > -- > Dikang > >
Re: Too many sstables with DateTieredCompactionStrategy
why do you have 'timestamp_resolution': 'MILLISECONDS'? It should be left as default (MICROSECONDS) unless you do "USING TIMESTAMP "-inserts, see https://issues.apache.org/jira/browse/CASSANDRA-11041 On Mon, Feb 29, 2016 at 2:36 PM, Noorul Islam K Mwrote: > > Hi all, > > We are using below compaction settings for a table > > compaction = {'timestamp_resolution': 'MILLISECONDS', > 'max_sstable_age_days': '365', 'base_time_seconds': '60', 'class': > 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy'} > > But it is creating too many sstables. Currently number of sstables > is 4. We have been injecting data for the last three days. > > We have set the compactionthroughput to 128 MB/s > > $ nodetool getcompactionthroughput > > Current compaction throughput: 128 MB/s > > But this is not helping. > > How can we control the number of sstables in this case? > > Thanks and Regards > Noorul >
Re: JBOD device space allocation?
On Wed, Feb 24, 2016 at 6:28 PM, Jack Krupansky <jack.krupan...@gmail.com> wrote: > Thanks. I didn't pay enough attention to that statement on my initial > reading of that post (which was where I became aware of the 3.2 behavior in > the first place.) > > Considering that the doc explicitly recommends that the byte ordered > partitioner not be used, that implies that the 3.2 JBOD behavior should be > used for all recommended partitioner use cases. > > I'm still not clear on when exactly a node would not have "localRanges" - > in terms of how the user would hit that scenario, or is than merely a > defensive check for a scenario which cannot normally be encountered? I > mean, it means that the endpoint is not responsible for any range of > tokens, but how can that ever be true, or is that simply if the user > configures the node to own zero tokens? But other than that, is there any > normal way a user could end up with a node that has no "localRanges"? > IIRC it is only defensive now - before https://issues.apache.org/jira/browse/CASSANDRA-9317 it could be empty during startup > > But even if the node owns no "local" ranges, can't it have replicated data > from RF=k-1 other nodes? Or does empty localRanges mean than the RF=k-1 > nodes that might have replicated data for this node are all also configured > to own zero tokens? Seems that way. But is there any reasonable scenario > under which the user would hit this? I mean, why would the code care either > way with respect to JBOD strategy for the case where no local data is > stored? > local ranges are all ranges the node should store - if you have 256 vnode tokens and RF=3, you will have 768 local ranges /Marcus > > > -- Jack Krupansky > > On Wed, Feb 24, 2016 at 2:15 AM, Marcus Eriksson <krum...@gmail.com> > wrote: > >> It is mentioned here btw: http://www.datastax.com/dev/blog/improving-jbod >> >> On Wed, Feb 24, 2016 at 8:14 AM, Marcus Eriksson <krum...@gmail.com> >> wrote: >> >>> If you don't use RandomPartitioner/Murmur3Partitioner you will get the >>> old behavior. >>> >>> On Wed, Feb 24, 2016 at 2:47 AM, Jack Krupansky < >>> jack.krupan...@gmail.com> wrote: >>> >>>> I just wanted to confirm whether my understanding of how JBOD allocates >>>> device space is correct of not... >>>> >>>> Pre-3.2: >>>> On each memtable flush Cassandra will select the directory (device) >>>> which has the most available space as a percentage of the total available >>>> space on all of the listed directories/devices. A random weighted value is >>>> used so it won't always pick the same directory/device with the most space, >>>> the goal being to balance writes for performance. >>>> >>>> As of 3.2: >>>> The ranges of tokens stored on the local node will be evenly >>>> distributed among the configured storage devices - even by token range, >>>> even if that may be uneven by actual partition sizes. The code presumes >>>> that each of the configured local storage devices has the same capacity. >>>> >>>> The relevant change in 3.2 appears to be: >>>> Make sure tokens don't exist in several data directories >>>> (CASSANDRA-6696) >>>> >>>> The code for the pre-3.2 model is still in 3.x - is there some other >>>> code path which will cause the pre-3.2 behavior even when runing 3.2 or >>>> later? >>>> >>>> I see this code which seems to allow for at least some cases where the >>>> pre-3.2 behavior would still be invoked, but I'm not sure what user-level >>>> cases that might be: >>>> >>>> if (!cfs.getPartitioner().splitter().isPresent() || >>>> localRanges.isEmpty()) >>>> return Collections.singletonList(new >>>> FlushRunnable(lastReplayPosition.get(), txn)); >>>> >>>> return createFlushRunnables(localRanges, txn); >>>> >>>> IOW, if the partitioner does not have a splitter present or the >>>> localRanges for the node cannot be determined. But... what exactly would a >>>> user do to cause that? >>>> >>>> There is no doc for this stuff - can a committer (or adventurous user!) >>>> confirm what is actually implemented, both pre and post 3.2? (I already >>>> pinged docs on this.) >>>> >>>> Or if anybody is actually using JBOD, what behavior they are seeing for >>>> device space utilization. >>>> >>>> Thanks! >>>> >>>> -- Jack Krupansky >>>> >>> >>> >> >
Re: JBOD device space allocation?
If you don't use RandomPartitioner/Murmur3Partitioner you will get the old behavior. On Wed, Feb 24, 2016 at 2:47 AM, Jack Krupanskywrote: > I just wanted to confirm whether my understanding of how JBOD allocates > device space is correct of not... > > Pre-3.2: > On each memtable flush Cassandra will select the directory (device) which > has the most available space as a percentage of the total available space > on all of the listed directories/devices. A random weighted value is used > so it won't always pick the same directory/device with the most space, the > goal being to balance writes for performance. > > As of 3.2: > The ranges of tokens stored on the local node will be evenly distributed > among the configured storage devices - even by token range, even if that > may be uneven by actual partition sizes. The code presumes that each of the > configured local storage devices has the same capacity. > > The relevant change in 3.2 appears to be: > Make sure tokens don't exist in several data directories (CASSANDRA-6696) > > The code for the pre-3.2 model is still in 3.x - is there some other code > path which will cause the pre-3.2 behavior even when runing 3.2 or later? > > I see this code which seems to allow for at least some cases where the > pre-3.2 behavior would still be invoked, but I'm not sure what user-level > cases that might be: > > if (!cfs.getPartitioner().splitter().isPresent() || localRanges.isEmpty()) > return Collections.singletonList(new > FlushRunnable(lastReplayPosition.get(), txn)); > > return createFlushRunnables(localRanges, txn); > > IOW, if the partitioner does not have a splitter present or the > localRanges for the node cannot be determined. But... what exactly would a > user do to cause that? > > There is no doc for this stuff - can a committer (or adventurous user!) > confirm what is actually implemented, both pre and post 3.2? (I already > pinged docs on this.) > > Or if anybody is actually using JBOD, what behavior they are seeing for > device space utilization. > > Thanks! > > -- Jack Krupansky >
Re: JBOD device space allocation?
It is mentioned here btw: http://www.datastax.com/dev/blog/improving-jbod On Wed, Feb 24, 2016 at 8:14 AM, Marcus Eriksson <krum...@gmail.com> wrote: > If you don't use RandomPartitioner/Murmur3Partitioner you will get the old > behavior. > > On Wed, Feb 24, 2016 at 2:47 AM, Jack Krupansky <jack.krupan...@gmail.com> > wrote: > >> I just wanted to confirm whether my understanding of how JBOD allocates >> device space is correct of not... >> >> Pre-3.2: >> On each memtable flush Cassandra will select the directory (device) which >> has the most available space as a percentage of the total available space >> on all of the listed directories/devices. A random weighted value is used >> so it won't always pick the same directory/device with the most space, the >> goal being to balance writes for performance. >> >> As of 3.2: >> The ranges of tokens stored on the local node will be evenly distributed >> among the configured storage devices - even by token range, even if that >> may be uneven by actual partition sizes. The code presumes that each of the >> configured local storage devices has the same capacity. >> >> The relevant change in 3.2 appears to be: >> Make sure tokens don't exist in several data directories (CASSANDRA-6696) >> >> The code for the pre-3.2 model is still in 3.x - is there some other code >> path which will cause the pre-3.2 behavior even when runing 3.2 or later? >> >> I see this code which seems to allow for at least some cases where the >> pre-3.2 behavior would still be invoked, but I'm not sure what user-level >> cases that might be: >> >> if (!cfs.getPartitioner().splitter().isPresent() || localRanges.isEmpty()) >> return Collections.singletonList(new >> FlushRunnable(lastReplayPosition.get(), txn)); >> >> return createFlushRunnables(localRanges, txn); >> >> IOW, if the partitioner does not have a splitter present or the >> localRanges for the node cannot be determined. But... what exactly would a >> user do to cause that? >> >> There is no doc for this stuff - can a committer (or adventurous user!) >> confirm what is actually implemented, both pre and post 3.2? (I already >> pinged docs on this.) >> >> Or if anybody is actually using JBOD, what behavior they are seeing for >> device space utilization. >> >> Thanks! >> >> -- Jack Krupansky >> > >
Re: 3k sstables during a repair incremental !!
The reason for this is probably https://issues.apache.org/jira/browse/CASSANDRA-10831 (which only affects 2.1) So, if you had problems with incremental repair and LCS before, upgrade to 2.1.13 and try again /Marcus On Wed, Feb 10, 2016 at 2:59 PM, horschiwrote: > Hi Jean, > > we had the same issue, but on SizeTieredCompaction. During repair the > number of SSTables and pending compactions were exploding. > > It not only affected latencies, at some point Cassandra ran out of heap. > > After the upgrade to 2.2 things got much better. > > regards, > Christian > > > On Wed, Feb 10, 2016 at 2:46 PM, Jean Carlo > wrote: > > Hi Horschi !!! > > > > I have the 2.1.12. But I think it is something related to Level > compaction > > strategy. It is impressive that we passed from 6 sstables to 3k sstable. > > I think this will affect the latency on production because the number of > > compactions going on > > > > > > > > Best regards > > > > Jean Carlo > > > > "The best way to predict the future is to invent it" Alan Kay > > > > On Wed, Feb 10, 2016 at 2:37 PM, horschi wrote: > >> > >> Hi Jean, > >> > >> which Cassandra version do you use? > >> > >> Incremental repair got much better in 2.2 (for us at least). > >> > >> kind regards, > >> Christian > >> > >> On Wed, Feb 10, 2016 at 2:33 PM, Jean Carlo > >> wrote: > >> > Hello guys! > >> > > >> > I am testing the repair inc in my custer cassandra. I am doing my test > >> > over > >> > these tables > >> > > >> > CREATE TABLE pns_nonreg_bench.cf3 ( > >> > s text, > >> > sp int, > >> > d text, > >> > dp int, > >> > m map , > >> > t timestamp, > >> > PRIMARY KEY (s, sp, d, dp) > >> > ) WITH CLUSTERING ORDER BY (sp ASC, d ASC, dp ASC) > >> > > >> > AND compaction = {'class': > >> > 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'} > >> > AND compression = {'sstable_compression': > >> > 'org.apache.cassandra.io.compress.SnappyCompressor'} > >> > > >> > CREATE TABLE pns_nonreg_bench.cf1 ( > >> > ise text PRIMARY KEY, > >> > int_col int, > >> > text_col text, > >> > ts_col timestamp, > >> > uuid_col uuid > >> > ) WITH bloom_filter_fp_chance = 0.01 > >> > AND compaction = {'class': > >> > 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'} > >> > AND compression = {'sstable_compression': > >> > 'org.apache.cassandra.io.compress.SnappyCompressor'} > >> > > >> > table cf1 > >> > Space used (live): 665.7 MB > >> > table cf2 > >> > Space used (live): 697.03 MB > >> > > >> > It happens that when I do repair -inc -par on theses tables, cf2 got a > >> > pick > >> > of 3k sstables. When the repair finish, it takes 30 min or more to > >> > finish > >> > all the compactations and return to 6 sstable. > >> > > >> > I am a little concern about if this will happen on production. is it > >> > normal? > >> > > >> > Saludos > >> > > >> > Jean Carlo > >> > > >> > "The best way to predict the future is to invent it" Alan Kay > > > > >
Re: Transitioning to incremental repair
Bryan, this should be improved with https://issues.apache.org/jira/browse/CASSANDRA-10768 - could you try it out? On Tue, Dec 1, 2015 at 10:58 PM, Bryan Chengwrote: > Sorry if I misunderstood, but are you asking about the LCS case? > > Based on our experience, I would absolutely recommend you continue with > the migration procedure. Even if the compaction strategy is the same, the > process of anticompaction is incredibly painful. We observed our test > cluster running 2.1.11 experiencing a dramatic increase in latency and not > responding to nodetool queries over JMX while anticompacting the largest > SSTables. This procedure also took several times longer than a standard > full repair. > > If you absolutely cannot perform the migration procedure, I believe 2.2.x > contains the changes to automatically set the RepairedAt flags after a full > repair, so you may be able to do a full repair on 2.2.x and then transition > directly to incremental without migrating (can someone confirm?) >
Re: Transitioning to incremental repair
Yes, it should now be safe to just run a repair with -inc -par to migrate to incremental repairs BUT, if you currently use for example repair service in OpsCenter or Spotifys Cassandra reaper, you might still want to migrate the way it is documented as you will have to run a full repair to migrate to incremental repairs, not many sub range repairs and that might not be possible for some users with a lot of data or with vnodes etc. I would also wait until https://issues.apache.org/jira/browse/CASSANDRA-10768 has been committed and released as it will improve anticompaction performance /Marcus On Tue, Dec 1, 2015 at 3:24 PM, Sam Klockwrote: > Hi folks, > > A question like this was recently asked, but I don't think anyone ever > supplied an unambiguous answer. We have a set of clusters currently > using sequential repair, and we'd like to transition them to > incremental repair. According to the documentation, this is a very > manual (and likely time-consuming) process: > > > http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsRepairNodesMigration.html > > Our understanding is that this process is necessary for tables that use > LCS, as unrepaired tables are compacted using STCS and (without the > process described in the doc) all tables start in the unrepaired > state. The pain of this migration strategy is supposed to be offset by > the savings in undesired compaction activity. The docs aren't > especially clear, but it sounds like this strategy is not needed for > tables that use STCS. > > However, CASSANDRA-8004 (resolved against 2.1.2) appears intended to > have both the repaired and unrepaired sstable sets use the same > compaction strategy. It seems like that obviates the rationale for a > migration procedure, which is supported by offhand comments on this > list, e.g.: > > https://www.mail-archive.com/user%40cassandra.apache.org/msg40303.html > https://www.mail-archive.com/user%40cassandra.apache.org/msg44896.html > > In other words, it *looks* like the docs are obsolete, and the > migration process for existing clusters only consists of flipping the > switch (i.e., adding "-inc" to invocations of "nodetool repair"). > > Our questions: > > 1) Is our understanding of the status quo following 2.1.2 correct? > Does migrating existing clusters to incremental repair only require > adding the "-inc" argument, or is a process still required? > > 2) If a process is still required, have there been any changes since > 2.1.2? Are the docs up-to-date? > > 3) If there is no process or if the process has changed, are there > plans on the DataStax side to update the documentation accordingly? > > Thanks, > SK >
Re: LTCS Strategy Resulting in multiple SSTables
if you are on Cassandra 2.2, it is probably this: https://issues.apache.org/jira/browse/CASSANDRA-10270 On Tue, Sep 15, 2015 at 4:37 AM, Saladi Naiduwrote: > We are using Level Tiered Compaction Strategy on a Column Family. Below > are CFSTATS from two nodes in same cluster, one node has 880 SStables in L0 > whereas one node just has 1 SSTable in L0. In the node where there are > multiple SStables, all of them are small size and created same time stamp. > We ran Compaction, it did not result in much change, node remained with > huge number of SStables. Due to this large number of SSTables, Read > performance is being impacted > > In same cluster, under same keyspace, we are observing this discrepancy in > other column families as well. What is going wrong? What is the solution to > fix this > > *---*NODE1*---* > *Table: category_ranking_dedup* > *SSTable count: 1* > *SSTables in each level: [1, 0, 0, 0, 0, > 0, 0, 0, 0]* > *Space used (live): 2012037* > *Space used (total): 2012037* > *Space used by snapshots (total): 0* > *SSTable Compression Ratio: > 0.07677216119569073* > *Memtable cell count: 990* > *Memtable data size: 32082* > *Memtable switch count: 11* > *Local read count: 2842* > *Local read latency: 3.215 ms* > *Local write count: 18309* > *Local write latency: 5.008 ms* > *Pending flushes: 0* > *Bloom filter false positives: 0* > *Bloom filter false ratio: 0.0* > *Bloom filter space used: 816* > *Compacted partition minimum bytes: 87* > *Compacted partition maximum bytes: > 25109160* > *Compacted partition mean bytes: 22844* > *Average live cells per slice (last five > minutes): 338.84588318085855* > *Maximum live cells per slice (last five > minutes): 10002.0* > *Average tombstones per slice (last five > minutes): 36.53307529908515* > *Maximum tombstones per slice (last five > minutes): 36895.0* > > *NODE2--- * > *Table: category_ranking_dedup* > *SSTable count: 808* > *SSTables in each level: [808/4, 0, 0, 0, > 0, 0, 0, 0, 0]* > *Space used (live): 291641980* > *Space used (total): 291641980* > *Space used by snapshots (total): 0* > *SSTable Compression Ratio: > 0.1431106696818256* > *Memtable cell count: 4365293* > *Memtable data size: 3742375* > *Memtable switch count: 44* > *Local read count: 2061* > *Local read latency: 31.983 ms* > *Local write count: 30096* > *Local write latency: 27.449 ms* > *Pending flushes: 0* > *Bloom filter false positives: 0* > *Bloom filter false ratio: 0.0* > *Bloom filter space used: 54544* > *Compacted partition minimum bytes: 87* > *Compacted partition maximum bytes: > 25109160* > *Compacted partition mean bytes: 634491* > *Average live cells per slice (last five > minutes): 416.1780688985929* > *Maximum live cells per slice (last five > minutes): 10002.0* > *Average tombstones per slice (last five > minutes): 45.11547792333818* > *Maximum tombstones per slice (last five > minutes): 36895.0* > > > > > Naidu Saladi >
Re: Incremental repair from the get go
Starting up fresh it is totally OK to just start using incremental repairs On Thu, Sep 3, 2015 at 10:25 PM, Jean-Francois Gosselin < jfgosse...@gmail.com> wrote: > > On fresh install of Cassandra what's the best approach to start using > incremental repair from the get go (I'm using LCS) ? > > Run nodetool repair -inc after inserting a few rows , or we still need to > follow the migration procedure with sstablerepairedset ? > > From the documentation "... If you use the leveled compaction strategy > and perform an incremental repair for the first time, Cassandra performs > size-tiering on all SSTables because the repair/unrepaired status is > unknown. This operation can take a long time. To save time, migrate to > incremental repair one node at a time. ..." > > With almost no data size-tiering should be quick ? Basically is there a > short cut to avoid the migration via sstablerepairedset on a fresh install > ? > > Thanks > > JF >
Re: Garbage collector launched on all nodes at once
It is probably this: https://issues.apache.org/jira/browse/CASSANDRA-9549 On Wed, Jun 17, 2015 at 7:37 PM, Michał Łowicki mlowi...@gmail.com wrote: Looks that memtable heap size is growing on some nodes rapidly ( https://www.dropbox.com/s/3brloiy3fqang1r/Screenshot%202015-06-17%2019.21.49.png?dl=0). Drops are the places when nodes have been restarted. On Wed, Jun 17, 2015 at 6:53 PM, Michał Łowicki mlowi...@gmail.com wrote: Hi, Two datacenters with 6 nodes (2.1.6) each. In each DC garbage collection is launched at the same time on each node (See [1] for total GC duration per 5 seconds). RF is set to 3. Any ideas? [1] https://www.dropbox.com/s/bsbyew1jxbe3dgo/Screenshot%202015-06-17%2018.49.48.png?dl=0 -- BR, Michał Łowicki -- BR, Michał Łowicki
Re: LCS Strategy, compaction pending tasks keep increasing
nope, but you can correlate I guess, tools/bin/sstablemetadata gives you sstable level information and, it is also likely that since you get so many L0 sstables, you will be doing size tiered compaction in L0 for a while. On Tue, Apr 21, 2015 at 1:40 PM, Anishek Agarwal anis...@gmail.com wrote: @Marcus I did look and that is where i got the above but it doesnt show any detail about moving from L0 -L1 any specific arguments i should try with ? On Tue, Apr 21, 2015 at 4:52 PM, Marcus Eriksson krum...@gmail.com wrote: you need to look at nodetool compactionstats - there is probably a big L0 - L1 compaction going on that blocks other compactions from starting On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal anis...@gmail.com wrote: the some_bits column has about 14-15 bytes of data per key. On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal anis...@gmail.com wrote: Hello, I am inserting about 100 million entries via datastax-java driver to a cassandra cluster of 3 nodes. Table structure is as create keyspace test with replication = {'class': 'NetworkTopologyStrategy', 'DC' : 3}; CREATE TABLE test_bits(id bigint primary key , some_bits text) with gc_grace_seconds=0 and compaction = {'class': 'LeveledCompactionStrategy'} and compression={'sstable_compression' : ''}; have 75 threads that are inserting data into the above table with each thread having non over lapping keys. I see that the number of pending tasks via nodetool compactionstats keeps increasing and looks like from nodetool cfstats test.test_bits has SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0], Why is compaction not kicking in ? thanks anishek
Re: LCS Strategy, compaction pending tasks keep increasing
you need to look at nodetool compactionstats - there is probably a big L0 - L1 compaction going on that blocks other compactions from starting On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal anis...@gmail.com wrote: the some_bits column has about 14-15 bytes of data per key. On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal anis...@gmail.com wrote: Hello, I am inserting about 100 million entries via datastax-java driver to a cassandra cluster of 3 nodes. Table structure is as create keyspace test with replication = {'class': 'NetworkTopologyStrategy', 'DC' : 3}; CREATE TABLE test_bits(id bigint primary key , some_bits text) with gc_grace_seconds=0 and compaction = {'class': 'LeveledCompactionStrategy'} and compression={'sstable_compression' : ''}; have 75 threads that are inserting data into the above table with each thread having non over lapping keys. I see that the number of pending tasks via nodetool compactionstats keeps increasing and looks like from nodetool cfstats test.test_bits has SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 0, 0], Why is compaction not kicking in ? thanks anishek
Re: RepairException on C* 2.1.3
Issue here is that getPosition returns null I think this was fixed in https://issues.apache.org/jira/browse/CASSANDRA-8750 On Fri, Apr 17, 2015 at 10:55 PM, Robert Coli rc...@eventbrite.com wrote: On Fri, Apr 17, 2015 at 11:40 AM, Mark Greene green...@gmail.com wrote: I'm receiving an exception when I run a repair process via: 'nodetool repair -par keyspace' This JIRA claims fixed in 2.1.3, but I believe I have heard at least one other report that it isn't : https://issues.apache.org/jira/browse/CASSANDRA-8211 If I were you, I would : a) file a JIRA at http://issues.apache.org b) reply to the list telling us the URL of your issue =Rob
Re: nodetool cleanup error
It should work on 2.0.13. If it fails with that assertion, you should just retry. If that does not work, and you can reproduce this, please file a ticket /Marcus On Tue, Mar 31, 2015 at 9:33 AM, Amlan Roy amlan@cleartrip.com wrote: Hi, Thanks for the reply. Since nodetool cleanup is not working even after upgrading to 2.0.13, is it recommended to go to an older version (2.0.11 for example, with 2.0.12 also it did not work). Is there any other way of cleaning data from existing nodes after adding a new node. Regards, Amlan On 31-Mar-2015, at 5:00 am, Yuki Morishita mor.y...@gmail.com wrote: Looks like the issue is https://issues.apache.org/jira/browse/CASSANDRA-9070. On Mon, Mar 30, 2015 at 6:25 PM, Robert Coli rc...@eventbrite.com wrote: On Mon, Mar 30, 2015 at 4:21 PM, Amlan Roy amlan@cleartrip.com wrote: Thanks for the reply. I have upgraded to 2.0.13. Now I get the following error. If cleanup is still excepting for you on 2.0.13 with some sstables you have, I would strongly consider : 1) file a JIRA (http://issues.apache.org) and attach / offer the sstables for debugging 2) let the list know the JIRA id of the ticket =Rob -- Yuki Morishita t:yukim (http://twitter.com/yukim)
Re: Stable cassandra build for production usage
Do you see the segfault or do you see https://issues.apache.org/jira/browse/CASSANDRA-8716 ? On Tue, Mar 17, 2015 at 10:34 AM, Ajay ajay.ga...@gmail.com wrote: Hi, Now that 2.0.13 is out, I don't see nodetool cleanup issue( https://issues.apache.org/jira/browse/CASSANDRA-8718) been fixed yet. The bug show priority Minor. Anybody facing this issue?. Thanks Ajay On Thu, Mar 12, 2015 at 11:41 PM, Robert Coli rc...@eventbrite.com wrote: On Thu, Mar 12, 2015 at 10:50 AM, Ajay ajay.ga...@gmail.com wrote: Please suggest what is the best option in this for production deployment in EC2 given that we are deploying Cassandra cluster for the 1st time (so likely that we add more data centers/nodes and schema changes in the initial few months) Voting for 2.0.13 is in process. I'd wait for that. But I don't need OpsCenter. =Rob
Re: C* 2.1.3 - Incremental replacement of compacted SSTables
We had some issues with it right before we wanted to release 2.1.3 so we temporarily(?) disabled it, it *might* get removed entirely in 2.1.4, if you have any input, please comment on this ticket: https://issues.apache.org/jira/browse/CASSANDRA-8833 /Marcus On Sat, Feb 21, 2015 at 7:29 PM, Mark Greene green...@gmail.com wrote: I saw in the NEWS.txt that this has been disabled. Does anyone know why that was the case? Is it temporary just for the 2.1.3 release? Thanks, Mark Greene
Re: How to deal with too many sstables
https://issues.apache.org/jira/browse/CASSANDRA-8635 On Tue, Feb 3, 2015 at 5:47 AM, 曹志富 cao.zh...@gmail.com wrote: Just run nodetool repair. The nodes witch has many sstables are newest in my cluster.Before add these nodes to my cluster ,my cluster have not compaction automaticly because my cluster is an only write cluster. thanks. -- 曹志富 手机:18611121927 邮箱:caozf.zh...@gmail.com 微博:http://weibo.com/boliza/ 2015-02-03 12:16 GMT+08:00 Flavien Charlon flavien.char...@gmail.com: Did you run incremental repair? Incremental repair is broken in 2.1 and tends to create way too many SSTables. On 2 February 2015 at 18:05, 曹志富 cao.zh...@gmail.com wrote: Hi,all: I have 18 nodes C* cluster with cassandra2.1.2.Some nodes have aboud 40,000+ sstables. my compaction strategy is STCS. Could someone give me some solution to deal with this situation. Thanks. -- 曹志富 手机:18611121927 邮箱:caozf.zh...@gmail.com 微博:http://weibo.com/boliza/
Re: incremential repairs - again
Hi Unsure what you mean by automatically, but you should use -par -inc when you repair And, you should wait until 2.1.3 (which will be out very soon) before doing this, we have fixed many issues with incremental repairs /Marcus On Thu, Jan 29, 2015 at 7:44 AM, Roland Etzenhammer r.etzenham...@t-online.de wrote: Hi, a short question about the new incremental repairs again. I am running 2.1.2 (for testing). Marcus pointed me that 2.1.2 should do incremental repairs automatically, so I rolled back all steps taken. I expect that routine repair times will decrease when I do not put many new data on the cluster. But they dont - they are constant at about 1000 minutes per node, so I extracted all Repaired at with sstablemetadata and I cant see any recent date. I put several GB of data into the cluster in 2015 and I run nodetool repair -pr on every node regularly. Am I still missing something? Or is this one of the issues with 2.1.2 (CASSANDRA-8316)? Thanks for hints, Jan
Re: incremental repairs
If you are on 2.1.2+ (or using STCS) you don't those steps (should probably update the blog post). Now we keep separate levelings for the repaired/unrepaired data and move the sstables over after the first incremental repair But, if you are running 2.1 in production, I would recommend that you wait until 2.1.3 is out, https://issues.apache.org/jira/browse/CASSANDRA-8316 fixes a bunch of issues with incremental repairs -pr is sufficient, same rules apply as before, if you run -pr you need to repair every node /Marcus On Thu, Jan 8, 2015 at 9:16 AM, Roland Etzenhammer r.etzenham...@t-online.de wrote: Hi, I am currently trying to migrate my test cluster to incremental repairs. These are the steps I'm doing on every node: - touch marker - nodetool disableautocompation - nodetool repair - cassandra stop - find all *Data*.db files older then marker - invoke sstablerepairedset on those - cassandra start This is essentially what http://www.datastax.com/dev/ blog/anticompaction-in-cassandra-2-1 says. After all nodes migrated this way, I think I need to run my regular repairs more often and they should be faster afterwards. But do I need to run nodetool repair or is nodetool repair -pr sufficient? And do I need to reenable autocompation? Oder do I need to compact myself? Thanks for any input, Roland
Re: incremental repairs
Yes, you should reenable autocompaction /Marcus On Thu, Jan 8, 2015 at 10:33 AM, Roland Etzenhammer r.etzenham...@t-online.de wrote: Hi Marcus, thanks for that quick reply. I did also look at: http://www.datastax.com/documentation/cassandra/2.1/ cassandra/operations/ops_repair_nodes_c.html which describes the same process, it's 2.1.x, so I see that 2.1.2+ is not covered there. I did upgrade my testcluster to 2.1.2 and with your hint I take a look at sstablemetadata from a non migrated node and there are indeed Repaired at entries on some sstables already. So if I got this right, in 2.1.2+ there is nothing to do to switch to incremental repairs (apart from running the repairs themself). But one thing I see during testing is that there are many sstables, with small size: - in total there are 5521 sstables on one node - 115 sstables are bigger than 1MB - 4949 sstables are smaller than 10kB I don't know where they came from - I found one piece of information where this happend when cassandra was low on heap which happend to me while running tests (the suggested solution is to trigger compaction via JMX). Question for me: I did disable autocompaction on some nodes of our test cluster as the blog and docs said. Should/can I reenable autocompaction again with incremental repairs? Cheers, Roland
Re: Compaction Strategy guidance
If you are that write-heavy you should definitely go with STCS, LCS optimizes for reads by doing more compactions /Marcus On Tue, Nov 25, 2014 at 11:22 AM, Andrei Ivanov aiva...@iponweb.net wrote: Hi Jean-Armel, Nikolai, 1. Increasing sstable size doesn't work (well, I think, unless we overscale - add more nodes than really necessary, which is prohibitive for us in a way). Essentially there is no change. I gave up and will go for STCS;-( 2. We use 2.0.11 as of now 3. We are running on EC2 c3.8xlarge instances with EBS volumes for data (GP SSD) Jean-Armel, I believe that what you say about many small instances is absolutely true. But, is not good in our case - we write a lot and almost never read what we've written. That is, we want to be able to read everything, but in reality we hardly read 1%, I think. This implies that smaller instances are of no use in terms of read performance for us. And generally nstances/cpu/ram is more expensive than storage. So, we really would like to have instances with large storage. Andrei. On Tue, Nov 25, 2014 at 11:23 AM, Jean-Armel Luce jaluc...@gmail.com wrote: Hi Andrei, Hi Nicolai, Which version of C* are you using ? There are some recommendations about the max storage per node : http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2 For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to handle 10x (3-5TB). I have the feeling that those recommendations are sensitive according many criteria such as : - your hardware - the compaction strategy - ... It looks that LCS lower those limitations. Increasing the size of sstables might help if you have enough CPU and you can put more load on your I/O system (@Andrei, I am interested by the results of your experimentation about large sstable files) From my point of view, there are some usage patterns where it is better to have many small servers than a few large servers. Probably, it is better to have many small servers if you need LCS for large tables. Just my 2 cents. Jean-Armel 2014-11-24 19:56 GMT+01:00 Robert Coli rc...@eventbrite.com: On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev ngrigor...@gmail.com wrote: One of the obvious recommendations I have received was to run more than one instance of C* per host. Makes sense - it will reduce the amount of data per node and will make better use of the resources. This is usually a Bad Idea to do in production. =Rob
Re: LCS: sstables grow larger
I suspect they are getting size tiered in L0 - if you have too many sstables in L0, we will do size tiered compaction on sstables in L0 to improve performance Use tools/bin/sstablemetadata to get the level for those sstables, if they are in L0, that is probably the reason. /Marcus On Tue, Nov 18, 2014 at 2:06 PM, Andrei Ivanov aiva...@iponweb.net wrote: Dear all, I have the following problem: - C* 2.0.11 - LCS with default 160MB - Compacted partition maximum bytes: 785939 (for cf/table xxx.xxx) - Compacted partition mean bytes: 6750 (for cf/table xxx.xxx) I would expect the sstables to be of +- maximum 160MB. Despite this I see files like: 192M Nov 18 13:00 xxx-xxx-jb-15580-Data.db or 631M Nov 18 13:03 xxx-xxx-jb-15583-Data.db Am I missing something? What could be the reason? (Actually this is a fresh cluster - on an old one I'm seeing 500GB sstables). I'm getting really desperate in my attempt to understand what's going on. Thanks in advance Andrei.
Re: LCS: sstables grow larger
No, they will get compacted into smaller sstables in L1+ eventually (once you have less than 32 sstables in L0, an ordinary L0 - L1 compaction will happen) But, if you consistently get many files in L0 it means that compaction is not keeping up with your inserts and you should probably expand your cluster (or consider going back to SizeTieredCompactionStrategy for the tables that take that many writes) /Marcus On Tue, Nov 18, 2014 at 2:49 PM, Andrei Ivanov aiva...@iponweb.net wrote: Marcus, thanks a lot! It explains a lot those huge tables are indeed at L0. It seems that they start to appear as a result of some massive operations (join, repair, rebuild). What's their fate in the future? Will they continue to propagate like this through levels? Is there anything that can be done to avoid/solve/prevent this? My fears here are around a feeling that those big tables (like in my old cluster) will be hardly compactable in the future... Sincerely, Andrei. On Tue, Nov 18, 2014 at 4:27 PM, Marcus Eriksson krum...@gmail.com wrote: I suspect they are getting size tiered in L0 - if you have too many sstables in L0, we will do size tiered compaction on sstables in L0 to improve performance Use tools/bin/sstablemetadata to get the level for those sstables, if they are in L0, that is probably the reason. /Marcus On Tue, Nov 18, 2014 at 2:06 PM, Andrei Ivanov aiva...@iponweb.net wrote: Dear all, I have the following problem: - C* 2.0.11 - LCS with default 160MB - Compacted partition maximum bytes: 785939 (for cf/table xxx.xxx) - Compacted partition mean bytes: 6750 (for cf/table xxx.xxx) I would expect the sstables to be of +- maximum 160MB. Despite this I see files like: 192M Nov 18 13:00 xxx-xxx-jb-15580-Data.db or 631M Nov 18 13:03 xxx-xxx-jb-15583-Data.db Am I missing something? What could be the reason? (Actually this is a fresh cluster - on an old one I'm seeing 500GB sstables). I'm getting really desperate in my attempt to understand what's going on. Thanks in advance Andrei.
Re: LCS: sstables grow larger
you should stick to as small nodes as possible yes :) There are a few relevant tickets related to bootstrap and LCS: https://issues.apache.org/jira/browse/CASSANDRA-6621 - startup with -Dcassandra.disable_stcs_in_l0=true to not do STCS in L0 https://issues.apache.org/jira/browse/CASSANDRA-7460 - (3.0) send source sstable level when bootstrapping On Tue, Nov 18, 2014 at 3:33 PM, Andrei Ivanov aiva...@iponweb.net wrote: OK, got it. Actually, my problem is not that we constantly having many files at L0. Normally, quite a few of them - that is, nodes are managing to compact incoming writes in a timely manner. But it looks like when we join a new node, it receives tons of files from existing nodes (and they end up at L0, right?) and that seems to be where our problems start. In practice, in what I call the old cluster, compaction became a problem at ~2TB nodes. (You, know, we are trying to save something on HW - we are running on EC2 with EBS volumes) Do I get it right that, we better stick to cmaller nodes? On Tue, Nov 18, 2014 at 5:20 PM, Marcus Eriksson krum...@gmail.com wrote: No, they will get compacted into smaller sstables in L1+ eventually (once you have less than 32 sstables in L0, an ordinary L0 - L1 compaction will happen) But, if you consistently get many files in L0 it means that compaction is not keeping up with your inserts and you should probably expand your cluster (or consider going back to SizeTieredCompactionStrategy for the tables that take that many writes) /Marcus On Tue, Nov 18, 2014 at 2:49 PM, Andrei Ivanov aiva...@iponweb.net wrote: Marcus, thanks a lot! It explains a lot those huge tables are indeed at L0. It seems that they start to appear as a result of some massive operations (join, repair, rebuild). What's their fate in the future? Will they continue to propagate like this through levels? Is there anything that can be done to avoid/solve/prevent this? My fears here are around a feeling that those big tables (like in my old cluster) will be hardly compactable in the future... Sincerely, Andrei. On Tue, Nov 18, 2014 at 4:27 PM, Marcus Eriksson krum...@gmail.com wrote: I suspect they are getting size tiered in L0 - if you have too many sstables in L0, we will do size tiered compaction on sstables in L0 to improve performance Use tools/bin/sstablemetadata to get the level for those sstables, if they are in L0, that is probably the reason. /Marcus On Tue, Nov 18, 2014 at 2:06 PM, Andrei Ivanov aiva...@iponweb.net wrote: Dear all, I have the following problem: - C* 2.0.11 - LCS with default 160MB - Compacted partition maximum bytes: 785939 (for cf/table xxx.xxx) - Compacted partition mean bytes: 6750 (for cf/table xxx.xxx) I would expect the sstables to be of +- maximum 160MB. Despite this I see files like: 192M Nov 18 13:00 xxx-xxx-jb-15580-Data.db or 631M Nov 18 13:03 xxx-xxx-jb-15583-Data.db Am I missing something? What could be the reason? (Actually this is a fresh cluster - on an old one I'm seeing 500GB sstables). I'm getting really desperate in my attempt to understand what's going on. Thanks in advance Andrei.
Re: Question on how to run incremental repairs
On Wed, Oct 22, 2014 at 2:39 PM, Juho Mäkinen juho.maki...@gmail.com wrote: I'm having problems understanding how incremental repairs are supposed to be run. If I try to do nodetool repair -inc cassandra will complain that It is not possible to mix sequential repair and incremental repairs. However it seems that running nodetool repair -inc -par does the job, but I couldn't be sure if this is the correct (and only?) way to run incremental repairs? yes, you need to run with -par Previously I ran repairs with nodetool repair -pr on each node at a time, so that I could minimise the performance hit. I've understood that doing a single nodetool repair -inc -par command runs it on all machines in the entire cluster, so doesn't that cause a big performance penalty? Can I run incremental repairs on one node at a time? repair still works the same way, you can do with -pr, and no, repair -inc -par does not run on all nodes, it repairs all ranges that the node you are executing it on owns, so, if you have rf = 3 you will need to run repair (without -pr) on every third node If running nodetool repair -inc -par every night in a single node is fine, should I still spread them out so that each node takes a turn executing this command each night? use your old schedule, repair works the same way, just that incremental repair does not include already repaired sstables Last question is a bit deeper: What I've understood is that incremental repairs don't do repairs on SSTables which have already been repaired, but doesn't this mean that these repaired SSTables can't be checked towards missing or incorrect data? no, if you get a corrupt sstable for example, you will need to run an old style repair on that node (without -inc).
Re: stream_throughput_outbound_megabits_per_sec
On Thu, Oct 16, 2014 at 1:54 AM, Donald Smith donald.sm...@audiencescience.com wrote: *stream_throughput_outbound_megabits_per_sec* is the timeout per operation on the streaming socket. The docs recommend not to have it too low (because a timeout causes streaming to restart from the beginning). But the default 0 never times out. What's a reasonable value? no, it is not a timeout, it states how fast sstables are streamed Does it stream an entire SSTable in one operation? I doubt it. How large is the object it streams in one operation? I'm tempted to put the timeout at 30 seconds or 1 minute. Is that too low? unsure what you meat by 'operation' here, but it is one tcp connection, streaming the whole file (if thats what we want) /Marcus
Re: Disabling compaction
what version are you on? On Thu, Oct 9, 2014 at 10:33 PM, Parag Shah ps...@proofpoint.com wrote: Hi all, I am trying to disable compaction for a few select tables. Here is a definition of one such table: CREATE TABLE blob_2014_12_31 ( blob_id uuid, blob_index int, blob_chunk blob, PRIMARY KEY (blob_id, blob_index) ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND index_interval=128 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND speculative_retry='99.0PERCENTILE' AND memtable_flush_period_in_ms=0 AND compaction={'enabled': 'false', 'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'LZ4Compressor’}; I have set compaction ‘enabled’ : ‘false’ on the above table. However, I do see compactions being run for this node: -bash-3.2$ nodetool compactionstats pending tasks: 55 compaction typekeyspace table completed total unit progress Compaction ids_high_awslab blob_2014_11_15 18122816990 35814893020 bytes50.60% Compaction ids_high_awslab blob_2014_12_31 18576750966 34242866468 bytes54.25% Compaction ids_high_awslab blob_2014_12_15 19213914904 35956698600 bytes53.44% Active compaction remaining time : 0h49m46s Can you someone tell me why this is happening? Do I need to set the compaction threshold to 0 0? Regards Parag
Re: Disabling compaction
this is fixed in 2.0.8; https://issues.apache.org/jira/browse/CASSANDRA-7187 /Marcus On Fri, Oct 10, 2014 at 7:11 PM, Parag Shah ps...@proofpoint.com wrote: Cassandra Version: 2.0.7 In my application, I am using Cassandra Java Driver 2.0.2 Thanks Parag From: Marcus Eriksson krum...@gmail.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Thursday, October 9, 2014 at 11:56 PM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: Disabling compaction what version are you on? On Thu, Oct 9, 2014 at 10:33 PM, Parag Shah ps...@proofpoint.com wrote: Hi all, I am trying to disable compaction for a few select tables. Here is a definition of one such table: CREATE TABLE blob_2014_12_31 ( blob_id uuid, blob_index int, blob_chunk blob, PRIMARY KEY (blob_id, blob_index) ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND index_interval=128 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND speculative_retry='99.0PERCENTILE' AND memtable_flush_period_in_ms=0 AND compaction={'enabled': 'false', 'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'LZ4Compressor’}; I have set compaction ‘enabled’ : ‘false’ on the above table. However, I do see compactions being run for this node: -bash-3.2$ nodetool compactionstats pending tasks: 55 compaction typekeyspace table completed total unit progress Compaction ids_high_awslab blob_2014_11_15 18122816990 35814893020 bytes50.60% Compaction ids_high_awslab blob_2014_12_31 18576750966 34242866468 bytes54.25% Compaction ids_high_awslab blob_2014_12_15 19213914904 35956698600 bytes53.44% Active compaction remaining time : 0h49m46s Can you someone tell me why this is happening? Do I need to set the compaction threshold to 0 0? Regards Parag
Re: Would warnings about overlapping SStables explain high pending compactions?
Not really What version are you on? Do you have pending compactions and no ongoing compactions? /Marcus On Wed, Sep 24, 2014 at 11:35 PM, Donald Smith donald.sm...@audiencescience.com wrote: On one of our nodes we have lots of pending compactions (499).In the past we’ve seen pending compactions go up to 2400 and all the way back down again. Investigating, I saw warnings such as the following in the logs about overlapping SStables and about needing to run “nodetool scrub” on a table. Would the overlapping SStables explain the pending compactions? WARN [RMI TCP Connection(2)-10.5.50.30] 2014-09-24 09:14:11,207 LeveledManifest.java (line 154) At level 1, SSTableReader(path='/data/data/XYZ/ABC/XYZ-ABC-jb-388233-Data.db') [DecoratedKey(-6112875836465333229, 3366636664393031646263356234663832383264616561666430383739383738), DecoratedKey(-4509284829153070912, 3366336562386339376664376633353635333432636662373739626465393636)] overlaps SSTableReader(path='/data/data/XYZ/ABC/XYZ-ABC_blob-jb-388150-Data.db') [DecoratedKey(-4834684725563291584, 336633623334363664363632666365303664333936336337343566373838), DecoratedKey(-4136919579566299218, 3366613535646662343235336335633862666530316164323232643765323934)]. This could be caused by a bug in Cassandra 1.1.0 .. 1.1.3 or due to the fact that you have dropped sstables from another node into the data directory. Sending back to L0. If you didn't drop in sstables, and have not yet run scrub, you should do so since you may also have rows out-of-order within an sstable Thanks *Donald A. Smith* | Senior Software Engineer P: 425.201.3900 x 3866 C: (206) 819-5965 F: (646) 443-2333 dona...@audiencescience.com [image: AudienceScience]
Re: Worse perf after Row Caching version 1.2.5:
select * from table will not populate row cache, but if the row is cached, it will be used. You need to use select * from table where X=Y to populate row cache. when setting caching = rows_only you disable key cache which might hurt your performance. On Wed, Feb 12, 2014 at 9:05 PM, PARASHAR, BHASKARJYA JAY bp1...@att.comwrote: Thanks Jonathan, I have the cfstats but our prod team has changed some configs after my post and I do not have the cfhistograms information now. No Of nodes: 3 Ram: 472GB Cassandra version: 1.2.5 I am pasting the cfstats below. Regards Jay CREATE TABLE EnablerCreditReasonInfo ( key text PRIMARY KEY, creditReasonDescription text ) WITH COMPACT STORAGE AND bloom_filter_fp_chance=0.01 AND caching='ROWS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; CFStats Column Family: EnablerCreditReasonInfo SSTable count: 3 Space used (live): 108067 Space used (total): 108067 Number of Keys (estimate): 1920 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 0 Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Bloom Filter False Positives: 0 Bloom Filter False Ratio: 0.0 Bloom Filter Space Used: 2232 Compacted row minimum size: 61 Compacted row maximum size: 149 Compacted row mean size: 100 *From:* Jonathan Lacefield [mailto:jlacefi...@datastax.com] *Sent:* Tuesday, February 11, 2014 10:43 AM *To:* user@cassandra.apache.org *Subject:* Re: Worse perf after Row Caching version 1.2.5: Hello, Please paste the output of cfhistograms for these tables. Also, what does your environment look like, number of nodes, disk drive configs, memory, C* version, etc. Thanks, Jonathan Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 [image: Image removed by sender.] http://www.linkedin.com/in/jlacefield [image: Image removed by sender.]http://www.datastax.com/what-we-offer/products-services/training/virtual-training On Tue, Feb 11, 2014 at 10:26 AM, PARASHAR, BHASKARJYA JAY bp1...@att.com wrote: Hi, I have two tables and I enabled row caching for both of them using CQL. These two CF's are very small with one about 300 rows and other 2000 rows. The rows themselves are small. Cassandra heap: 8gb. a. alter table TABLE_X with caching = 'rows_only'; b. alter table TABLE_Y with caching = 'rows_only'; I also changed row_cache_size_in_mb: 1024 in the Cassandra.yaml file. After extensive testing, it seems the performance of Table_X degraded from 600ms to 750ms and Table_Y gained about 10 ms (from 188ms to 177 ms). More Info Table X is always queried with Select * from Table_X; Cfstats in Table_X shows Read Latency: NaN ms. I assumed that since we select all the rows, the entire table would be cached. Table_Y has a secondary index and is queried on that index. Would appreciate any input why the performance is worse and how to enable row caching for these two tables. Thanks Jay inline: ~WRD000.jpg
Re: Migrate data from acunu to Apache cassandra 1.1.12
You need an up to date Cassandra, files with -ic- are for Cassandra 1.2.5+ /Marcus On Mon, Feb 3, 2014 at 8:31 AM, Aravindan T aravinda...@tcs.com wrote: Hi, There is a necessity where i need to migrate data from acunu cassandra to apache cassandra . As part of it, the column families snapshots are taken using the nodetool command but while loading into the Apache cassandra with the help of sstableloader, i get error's like below WARN 12:13:25,299 Invalid file 'samplewatchtower-student-ic-5-TOC.txt' in data directory /samplewatchtower/student. Skipping file samplewatchtower-student-ic-6-Data.db, error opening it: EOF after 0 bytes out of 8 WARN 12:13:25,315 Invalid file 'samplewatchtower-student-ic-6-Summary.db' in data directory /samplewatchtower/student. Skipping file samplewatchtower-student-ic-5-Data.db, error opening it: EOF after 0 bytes out of 8 WARN 12:13:25,316 Invalid file 'samplewatchtower-student-ic-6-TOC.txt' in data directory /samplewatchtower/student. WARN 12:13:25,316 Invalid file 'samplewatchtower-student-ic-5-Summary.db' in data directory /samplewatchtower/student. No sstables to stream. Can you please help in how to perform this data migration successfully? Aravind =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you
Re: Endless loop LCS compaction
this has been fixed: https://issues.apache.org/jira/browse/CASSANDRA-6496 On Wed, Dec 18, 2013 at 2:51 PM, Desimpel, Ignace ignace.desim...@nuance.com wrote: Hi, Would it not be possible that in some rare cases these 'small' files are created also and thus resulting in the same endless loop behavior? Like a storm on the server make the memtables flushing. When the storm lies down, the compaction then would have the same problem? Regards, Ignace -Original Message- From: Desimpel, Ignace Sent: dinsdag 12 november 2013 09:32 To: 'Chris Burroughs' Subject: RE: Endless loop LCS compaction I think that regardless the size, the code should not go into an endless loop. -Original Message- From: Chris Burroughs [mailto:chris.burrou...@gmail.com] Sent: vrijdag 8 november 2013 16:49 To: user@cassandra.apache.org Cc: Desimpel, Ignace Subject: Re: Endless loop LCS compaction On 11/07/2013 06:48 AM, Desimpel, Ignace wrote: Total data size is only 3.5GB. Column family was created with SSTableSize : 10 MB You may want to try a significantly larger size. https://issues.apache.org/jira/browse/CASSANDRA-5727
Re: Cassand is holding too many deleted file descriptors
yeah this is known, and we are looking for a fix https://issues.apache.org/jira/browse/CASSANDRA-6275 if you have a simple way of reproducing, please add a comment On Thu, Nov 14, 2013 at 10:53 AM, Murthy Chelankuri kmurt...@gmail.comwrote: I See lots of these deleted file descriptors cassandra is holding in my case out of 90K file descriptors 80.5K is having these descriptors Because of this cassandra is not performing well. Can some one please tell what i am doing wrong. lr-x-- 1 root root 64 Nov 14 08:25 10875 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-119-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10876 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-110-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10877 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-133-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10878 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-124-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10879 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-110-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:11 1088 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-110-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10880 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-133-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10881 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-119-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10882 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-124-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10883 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-119-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10884 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-133-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10885 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-110-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10886 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-124-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10887 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-119-Data.db (deleted)
Re: Migration LCS from 1.2.X to 2.0.x exception
this is the issue: https://issues.apache.org/jira/browse/CASSANDRA-5383 guess it fell between chairs, will poke around On Tue, Sep 24, 2013 at 4:26 PM, Nate McCall n...@thelastpickle.com wrote: What version of 1.2.x? Unfortunately, you must go through 1.2.9 first. See https://github.com/apache/cassandra/blob/cassandra-2.0.0/NEWS.txt#L19-L24 On Tue, Sep 24, 2013 at 8:57 AM, Desimpel, Ignace ignace.desim...@nuance.com wrote: Tested on WINDOWS : On startup of the 2.0.0 version from 1.2.x files I get an error as listed below. ** ** This is due to the code in LeveledManifest:: mutateLevel. The method already has a comment saying that it is scary … On windows, one cannot use the File::rename if the target file name already exists. Also, even on Linux, I’m not sure if a rename would actually ‘overwrite/implicit-delete’ the content of the target file. ** ** Anyway, adding code (below) before the FileUtils.renameWithConfirm should work in both cases (maybe even rename the fromFile to be able to recover…) File oTo = new File(filename); if ( oTo.exists() ) oTo.delete(); ** ** ** ** java.lang.RuntimeException: Failed to rename …..xxx\Function-ic-10-Statistics.db-tmp to …..xxx\Function-ic-10-Statistics.db at org.apache.cassandra.io.util.FileUtils.renameWithConfirm(FileUtils.java:136) ~[main/:na] at org.apache.cassandra.io.util.FileUtils.renameWithConfirm(FileUtils.java:125) ~[main/:na] at org.apache.cassandra.db.compaction.LeveledManifest.mutateLevel(LeveledManifest.java:601) ~[main/:na] at org.apache.cassandra.db.compaction.LegacyLeveledManifest.migrateManifests(LegacyLeveledManifest.java:103) ~[main/:na] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:247) ~[main/:na] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:443) ~[main/:na] ** ** Regards, ** ** Ignace Desimpel
Re: 1.2.10 - 2.0.1 migration issue
this is most likely a bug, filed https://issues.apache.org/jira/browse/CASSANDRA-6093 and will try to have a look today. On Wed, Sep 25, 2013 at 1:48 AM, Christopher Wirt chris.w...@struq.comwrote: Hi, ** ** Just had a go at upgrading a node to the latest stable c* 2 release and think I ran into some issues with manifest migration. ** ** On initial start up I hit this error as it starts to load the first of my CF. ** ** INFO [main] 2013-09-24 22:56:01,018 LegacyLeveledManifest.java (line 89) Migrating manifest for struqrealtime/impressionstorev2 INFO [main] 2013-09-24 22:56:01,019 LegacyLeveledManifest.java (line 119) Snapshotting struqrealtime, impressionstorev2 to pre-sstablemetamigration* *** ERROR [main] 2013-09-24 22:56:01,030 CassandraDaemon.java (line 459) Exception encountered during startup FSWriteError in /disk1/cassandra/data/struqrealtime/impressionstorev2/snapshots/pre-sstablemetamigration/impressionstorev2.json at org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:83)** ** at org.apache.cassandra.db.compaction.LegacyLeveledManifest.snapshotWithoutCFS(LegacyLeveledManifest.java:138) at org.apache.cassandra.db.compaction.LegacyLeveledManifest.migrateManifests(LegacyLeveledManifest.java:91) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:246) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:442) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:485) Caused by: java.nio.file.NoSuchFileException: /disk1/cassandra/data/struqrealtime/impressionstorev2/snapshots/pre-sstablemetamigration/impressionstorev2.json - /disk1/cassandra/data/struqrealtime/impressionstorev2/impressionstorev2.json at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixFileSystemProvider.createLink(UnixFileSystemProvider.java:474) at java.nio.file.Files.createLink(Files.java:1037) at org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:79)** ** ... 5 more ** ** I had already successful run a test migration on our dev server. Only real difference I can see if the number of data directories defined and the amount of data being held. ** ** I’ve run upgradesstables under 1.2.10. I have always been using vnodes and CQL3. I recently moved to using LZ4 instead of Snappy.. ** ** I tried to startup again and it gave me a slightly different error ** ** INFO [main] 2013-09-24 22:58:28,218 LegacyLeveledManifest.java (line 89) Migrating manifest for struqrealtime/impressionstorev2 INFO [main] 2013-09-24 22:58:28,218 LegacyLeveledManifest.java (line 119) Snapshotting struqrealtime, impressionstorev2 to pre-sstablemetamigration* *** ERROR [main] 2013-09-24 22:58:28,222 CassandraDaemon.java (line 459) Exception encountered during startup java.lang.RuntimeException: Tried to create duplicate hard link to /disk3/cassandra/data/struqrealtime/impressionstorev2/snapshots/pre-sstablemetamigration/struqrealtime-impressionstorev2-ic-1030-TOC.txt at org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:71)** ** at org.apache.cassandra.db.compaction.LegacyLeveledManifest.snapshotWithoutCFS(LegacyLeveledManifest.java:129) at org.apache.cassandra.db.compaction.LegacyLeveledManifest.migrateManifests(LegacyLeveledManifest.java:91) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:246) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:442) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:485) ** ** Will have a go recreating this tomorrow. ** ** Any insight or guesses at what the issue might be are always welcome. ** ** Thanks, Chris
Re: 1.2.10 - 2.0.1 migration issue
cant really reproduce, could you update the ticket with a bit more info about your setup? do you have multiple .json files in your data dirs? On Wed, Sep 25, 2013 at 10:07 AM, Marcus Eriksson krum...@gmail.com wrote: this is most likely a bug, filed https://issues.apache.org/jira/browse/CASSANDRA-6093 and will try to have a look today. On Wed, Sep 25, 2013 at 1:48 AM, Christopher Wirt chris.w...@struq.comwrote: Hi, ** ** Just had a go at upgrading a node to the latest stable c* 2 release and think I ran into some issues with manifest migration. ** ** On initial start up I hit this error as it starts to load the first of my CF. ** ** INFO [main] 2013-09-24 22:56:01,018 LegacyLeveledManifest.java (line 89) Migrating manifest for struqrealtime/impressionstorev2 INFO [main] 2013-09-24 22:56:01,019 LegacyLeveledManifest.java (line 119) Snapshotting struqrealtime, impressionstorev2 to pre-sstablemetamigration ERROR [main] 2013-09-24 22:56:01,030 CassandraDaemon.java (line 459) Exception encountered during startup FSWriteError in /disk1/cassandra/data/struqrealtime/impressionstorev2/snapshots/pre-sstablemetamigration/impressionstorev2.json at org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:83)* *** at org.apache.cassandra.db.compaction.LegacyLeveledManifest.snapshotWithoutCFS(LegacyLeveledManifest.java:138) at org.apache.cassandra.db.compaction.LegacyLeveledManifest.migrateManifests(LegacyLeveledManifest.java:91) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:246) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:442) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:485) Caused by: java.nio.file.NoSuchFileException: /disk1/cassandra/data/struqrealtime/impressionstorev2/snapshots/pre-sstablemetamigration/impressionstorev2.json - /disk1/cassandra/data/struqrealtime/impressionstorev2/impressionstorev2.json at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)*** * at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixFileSystemProvider.createLink(UnixFileSystemProvider.java:474) at java.nio.file.Files.createLink(Files.java:1037) at org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:79)* *** ... 5 more ** ** I had already successful run a test migration on our dev server. Only real difference I can see if the number of data directories defined and the amount of data being held. ** ** I’ve run upgradesstables under 1.2.10. I have always been using vnodes and CQL3. I recently moved to using LZ4 instead of Snappy.. ** ** I tried to startup again and it gave me a slightly different error ** ** INFO [main] 2013-09-24 22:58:28,218 LegacyLeveledManifest.java (line 89) Migrating manifest for struqrealtime/impressionstorev2 INFO [main] 2013-09-24 22:58:28,218 LegacyLeveledManifest.java (line 119) Snapshotting struqrealtime, impressionstorev2 to pre-sstablemetamigration ERROR [main] 2013-09-24 22:58:28,222 CassandraDaemon.java (line 459) Exception encountered during startup java.lang.RuntimeException: Tried to create duplicate hard link to /disk3/cassandra/data/struqrealtime/impressionstorev2/snapshots/pre-sstablemetamigration/struqrealtime-impressionstorev2-ic-1030-TOC.txt at org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:71)* *** at org.apache.cassandra.db.compaction.LegacyLeveledManifest.snapshotWithoutCFS(LegacyLeveledManifest.java:129) at org.apache.cassandra.db.compaction.LegacyLeveledManifest.migrateManifests(LegacyLeveledManifest.java:91) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:246) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:442) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:485) ** ** Will have a go recreating this tomorrow. ** ** Any insight or guesses at what the issue might be are always welcome. ** ** Thanks, Chris
Re: 1.2.10 - 2.0.1 migration issue
you are probably reading trunk NEWS.txt read the ticket for explanation of what the issue was (it is a proper bug) On Wed, Sep 25, 2013 at 12:59 PM, Christopher Wirt chris.w...@struq.comwrote: Hi Marcus, Thanks for having a look at this. ** ** Just noticed this in the NEWS.txt ** ** For *leveled* compaction users, 2.0 must be *atleast* started before upgrading to 2.1 due to the fact that the old JSON *leveled* manifest is migrated into the *sstable* *metadata* files on startup** ** in 2.0 and this code is gone from 2.1. ** ** Basically, my fault for skimming over this too quickly. ** ** We will move from 1.2.10 - 2.0 - 2.1 ** ** Thanks, Chris ** ** ** ** *From:* Marcus Eriksson [mailto:krum...@gmail.com] *Sent:* 25 September 2013 09:37 *To:* user@cassandra.apache.org *Subject:* Re: 1.2.10 - 2.0.1 migration issue ** ** cant really reproduce, could you update the ticket with a bit more info about your setup? ** ** do you have multiple .json files in your data dirs? ** ** On Wed, Sep 25, 2013 at 10:07 AM, Marcus Eriksson krum...@gmail.com wrote: this is most likely a bug, filed https://issues.apache.org/jira/browse/CASSANDRA-6093 and will try to have a look today. ** ** On Wed, Sep 25, 2013 at 1:48 AM, Christopher Wirt chris.w...@struq.com wrote: Hi, Just had a go at upgrading a node to the latest stable c* 2 release and think I ran into some issues with manifest migration. On initial start up I hit this error as it starts to load the first of my CF. INFO [main] 2013-09-24 22:56:01,018 LegacyLeveledManifest.java (line 89) Migrating manifest for struqrealtime/impressionstorev2 INFO [main] 2013-09-24 22:56:01,019 LegacyLeveledManifest.java (line 119) Snapshotting struqrealtime, impressionstorev2 to pre-sstablemetamigration* *** ERROR [main] 2013-09-24 22:56:01,030 CassandraDaemon.java (line 459) Exception encountered during startup FSWriteError in /disk1/cassandra/data/struqrealtime/impressionstorev2/snapshots/pre-sstablemetamigration/impressionstorev2.json at org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:83)** ** at org.apache.cassandra.db.compaction.LegacyLeveledManifest.snapshotWithoutCFS(LegacyLeveledManifest.java:138) at org.apache.cassandra.db.compaction.LegacyLeveledManifest.migrateManifests(LegacyLeveledManifest.java:91) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:246) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:442) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:485) Caused by: java.nio.file.NoSuchFileException: /disk1/cassandra/data/struqrealtime/impressionstorev2/snapshots/pre-sstablemetamigration/impressionstorev2.json - /disk1/cassandra/data/struqrealtime/impressionstorev2/impressionstorev2.json at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixFileSystemProvider.createLink(UnixFileSystemProvider.java:474) at java.nio.file.Files.createLink(Files.java:1037) at org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:79)** ** ... 5 more I had already successful run a test migration on our dev server. Only real difference I can see if the number of data directories defined and the amount of data being held. I’ve run upgradesstables under 1.2.10. I have always been using vnodes and CQL3. I recently moved to using LZ4 instead of Snappy.. I tried to startup again and it gave me a slightly different error INFO [main] 2013-09-24 22:58:28,218 LegacyLeveledManifest.java (line 89) Migrating manifest for struqrealtime/impressionstorev2 INFO [main] 2013-09-24 22:58:28,218 LegacyLeveledManifest.java (line 119) Snapshotting struqrealtime, impressionstorev2 to pre-sstablemetamigration* *** ERROR [main] 2013-09-24 22:58:28,222 CassandraDaemon.java (line 459) Exception encountered during startup java.lang.RuntimeException: Tried to create duplicate hard link to /disk3/cassandra/data/struqrealtime/impressionstorev2/snapshots/pre-sstablemetamigration/struqrealtime-impressionstorev2-ic-1030-TOC.txt at org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:71)** ** at org.apache.cassandra.db.compaction.LegacyLeveledManifest.snapshotWithoutCFS(LegacyLeveledManifest.java:129) at org.apache.cassandra.db.compaction.LegacyLeveledManifest.migrateManifests(LegacyLeveledManifest.java:91
Re: 1.2.10 - 2.0.1 migration issue
you probably have to remove the old snapshots before trying to restart On Wed, Sep 25, 2013 at 3:05 PM, Christopher Wirt chris.w...@struq.comwrote: Should also say. I have managed to move one node from 1.2.10 to 2.0.0. I’m seeing this error on the machine I tried to migrate earlier to 2.0.1 ** ** Thanks ** ** *From:* Christopher Wirt [mailto:chris.w...@struq.com] *Sent:* 25 September 2013 14:04 *To:* 'user@cassandra.apache.org' *Subject:* RE: 1.2.10 - 2.0.1 migration issue ** ** Hi Marcus, ** ** I’ve seen your patch. This works with what I’m seeing. The first data directory only contained the JSON manifest at that time. ** ** As a workaround I’ve made sure that each of the snapshot directories now exist before starting up. ** ** I still end up with the second exception I posted regarding a duplicate hard link. Possibly two unrelated exceptions. ** ** After getting this error. Looking at the datadirs Data1 contains JSON manifests Loads of data files Snapshot directory Data2 contains Just the snapshot directory Data3 contains Just the snapshot directory ** ** INFO 12:56:22,766 Migrating manifest for struqrealtime/impressionstorev2** ** INFO 12:56:22,767 Snapshotting struqrealtime, impressionstorev2 to pre-sstablemetamigration ERROR 12:56:22,787 Exception encountered during startup java.lang.RuntimeException: Tried to create duplicate hard link to /disk1/cassandra/data/struqrealtime/impressionstorev2/snapshots/pre-sstablemetamigration/impressionstorev2.json at org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:71)** ** at org.apache.cassandra.db.compaction.LegacyLeveledManifest.snapshotWithoutCFS(LegacyLeveledManifest.java:138) at org.apache.cassandra.db.compaction.LegacyLeveledManifest.migrateManifests(LegacyLeveledManifest.java:91) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:247) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:443) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:486) java.lang.RuntimeException: Tried to create duplicate hard link to /disk1/cassandra/data/struqrealtime/impressionstorev2/snapshots/pre-sstablemetamigration/impressionstorev2.json at org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:71)** ** at org.apache.cassandra.db.compaction.LegacyLeveledManifest.snapshotWithoutCFS(LegacyLeveledManifest.java:138) at org.apache.cassandra.db.compaction.LegacyLeveledManifest.migrateManifests(LegacyLeveledManifest.java:91) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:247) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:443) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:486) Exception encountered during startup: Tried to create duplicate hard link to /disk1/cassandra/data/struqrealtime/impressionstorev2/snapshots/pre-sstablemetamigration/impressionstorev2.json ** ** Thanks, ** ** Chris ** ** ** ** *From:* Marcus Eriksson [mailto:krum...@gmail.com] *Sent:* 25 September 2013 13:11 *To:* user@cassandra.apache.org *Subject:* Re: 1.2.10 - 2.0.1 migration issue ** ** you are probably reading trunk NEWS.txt ** ** read the ticket for explanation of what the issue was (it is a proper bug) ** ** On Wed, Sep 25, 2013 at 12:59 PM, Christopher Wirt chris.w...@struq.com wrote: Hi Marcus, Thanks for having a look at this. Just noticed this in the NEWS.txt For *leveled* compaction users, 2.0 must be *atleast* started before upgrading to 2.1 due to the fact that the old JSON *leveled* manifest is migrated into the *sstable* *metadata* files on startup** ** in 2.0 and this code is gone from 2.1. Basically, my fault for skimming over this too quickly. We will move from 1.2.10 - 2.0 - 2.1 Thanks, Chris *From:* Marcus Eriksson [mailto:krum...@gmail.com] *Sent:* 25 September 2013 09:37 *To:* user@cassandra.apache.org *Subject:* Re: 1.2.10 - 2.0.1 migration issue cant really reproduce, could you update the ticket with a bit more info about your setup? do you have multiple .json files in your data dirs? On Wed, Sep 25, 2013 at 10:07 AM, Marcus Eriksson krum...@gmail.com wrote: this is most likely a bug, filed https://issues.apache.org/jira/browse/CASSANDRA-6093 and will try to have a look today. On Wed, Sep 25, 2013 at 1:48 AM, Christopher Wirt chris.w...@struq.com
Re: manually removing sstable
yep that works, you need to remove all components of the sstable though, not just -Data.db and, in 2.0 there is this: https://issues.apache.org/jira/browse/CASSANDRA-5228 /Marcus On Wed, Jul 10, 2013 at 2:09 PM, Theo Hultberg t...@iconara.net wrote: Hi, I think I remember reading that if you have sstables that you know contain only data that whose ttl has expired, it's safe to remove them manually by stopping c*, removing the *-Data.db files and then starting up c* again. is this correct? we have a cluster where everything is written with a ttl, and sometimes c* needs to compact over a 100 gb of sstables where we know ever has expired, and we'd rather just manually get rid of those. T#
Re: old data / tombstones are not deleted after ttl
you could consider enabling leveled compaction: http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra On Tue, Mar 5, 2013 at 9:46 AM, Matthias Zeilinger matthias.zeilin...@bwinparty.com wrote: Short question afterwards: I have read in the documentation, that after a major compaction, minor compactions are no longer automatically trigger. Does this mean, that I have to do the nodetool compact regulary? Or is there a way to get back to the automatically minor compactions? Thx, Br, Matthias Zeilinger Production Operation – Shared Services P: +43 (0) 50 858-31185 M: +43 (0) 664 85-34459 E: matthias.zeilin...@bwinparty.com bwin.party services (Austria) GmbH Marxergasse 1B A-1030 Vienna www.bwinparty.com -Original Message- From: Matthias Zeilinger [mailto:matthias.zeilin...@bwinparty.com] Sent: Dienstag, 05. März 2013 08:03 To: user@cassandra.apache.org Subject: RE: old data / tombstones are not deleted after ttl Yes it was a major compaction. I know it´s not a great solution, but I needed something to get rid of the old data, because I went out of diskspace. Br, Matthias Zeilinger Production Operation – Shared Services P: +43 (0) 50 858-31185 M: +43 (0) 664 85-34459 E: matthias.zeilin...@bwinparty.com bwin.party services (Austria) GmbH Marxergasse 1B A-1030 Vienna www.bwinparty.com -Original Message- From: Michal Michalski [mailto:mich...@opera.com] Sent: Dienstag, 05. März 2013 07:47 To: user@cassandra.apache.org Subject: Re: old data / tombstones are not deleted after ttl Was it a major compaction? I ask because it's definitely a solution that had to work, but it's also a solution that - in general - probably no-one here would suggest you to use. M. W dniu 05.03.2013 07:08, Matthias Zeilinger pisze: Hi, I have done a manually compaction over the nodetool and this worked. But thx for the explanation, why it wasn´t compacted Br, Matthias Zeilinger Production Operation – Shared Services P: +43 (0) 50 858-31185 M: +43 (0) 664 85-34459 E: matthias.zeilin...@bwinparty.com bwin.party services (Austria) GmbH Marxergasse 1B A-1030 Vienna www.bwinparty.com From: Bryan Talbot [mailto:btal...@aeriagames.com] Sent: Montag, 04. März 2013 23:36 To: user@cassandra.apache.org Subject: Re: old data / tombstones are not deleted after ttl Those older files won't be included in a compaction until there are min_compaction_threshold (4) files of that size. When you get another SS table -Data.db file that is about 12-18GB then you'll have 4 and they will be compacted together into one new file. At that time, if there are any rows with only tombstones that are all older than gc_grace the row will be removed (assuming the row exists exclusively in the 4 input SS tables). Columns with data that is more than TTL seconds old will be written with a tombstone. If the row does have column values in SS tables that are not being compacted, the row will not be removed. -Bryan On Sun, Mar 3, 2013 at 11:07 PM, Matthias Zeilinger matthias.zeilin...@bwinparty.commailto:matthias.zeilin...@bwinparty.com wrote: Hi, I´m running Cassandra 1.1.5 and have following issue. I´m using a 10 days TTL on my CF. I can see a lot of tombstones in there, but they aren´t deleted after compaction. I have tried a nodetool –cleanup and also a restart of Cassandra, but nothing happened. total 61G drwxr-xr-x 2 cassandra dba 20K Mar 4 06:35 . drwxr-xr-x 10 cassandra dba 4.0K Dec 10 13:05 .. -rw-r--r-- 1 cassandra dba 15M Dec 15 22:04 whatever-he-1398-CompressionInfo.db -rw-r--r-- 1 cassandra dba 19G Dec 15 22:04 whatever-he-1398-Data.db -rw-r--r-- 1 cassandra dba 15M Dec 15 22:04 whatever-he-1398-Filter.db -rw-r--r-- 1 cassandra dba 357M Dec 15 22:04 whatever-he-1398-Index.db -rw-r--r-- 1 cassandra dba 4.3K Dec 15 22:04 whatever-he-1398-Statistics.db -rw-r--r-- 1 cassandra dba 9.5M Feb 6 15:45 whatever-he-5464-CompressionInfo.db -rw-r--r-- 1 cassandra dba 12G Feb 6 15:45 whatever-he-5464-Data.db -rw-r--r-- 1 cassandra dba 48M Feb 6 15:45 whatever-he-5464-Filter.db -rw-r--r-- 1 cassandra dba 736M Feb 6 15:45 whatever-he-5464-Index.db -rw-r--r-- 1 cassandra dba 4.3K Feb 6 15:45 whatever-he-5464-Statistics.db -rw-r--r-- 1 cassandra dba 9.7M Feb 21 19:13 whatever-he-6829-CompressionInfo.db -rw-r--r-- 1 cassandra dba 12G Feb 21 19:13 whatever-he-6829-Data.db -rw-r--r-- 1 cassandra dba 47M Feb 21 19:13 whatever-he-6829-Filter.db -rw-r--r-- 1 cassandra dba 792M Feb 21 19:13 whatever-he-6829-Index.db -rw-r--r-- 1 cassandra dba 4.3K Feb 21 19:13 whatever-he-6829-Statistics.db -rw-r--r-- 1 cassandra dba 3.7M Mar 1 10:46 whatever-he-7578-CompressionInfo.db -rw-r--r-- 1 cassandra dba 4.3G Mar 1 10:46 whatever-he-7578-Data.db -rw-r--r-- 1 cassandra dba 12M Mar 1 10:46
Re: how stable is 1.0 these days?
beware of https://issues.apache.org/jira/browse/CASSANDRA-3820 though if you have many keys per node other than that, yep, it seems solid /Marcus On Wed, Feb 29, 2012 at 6:20 PM, Thibaut Britz thibaut.br...@trendiction.com wrote: Thanks! We will test it on our test cluster in the coming weeks and hopefully put it into production on our 200 node main cluster. :) Thibaut On Wed, Feb 29, 2012 at 5:52 PM, Edward Capriolo edlinuxg...@gmail.comwrote: On Wed, Feb 29, 2012 at 10:35 AM, Thibaut Britz thibaut.br...@trendiction.com wrote: Any more feedback on larger deployments of 1.0.*? We are eager to try out the new features in production, but don't want to run into bugs as on former 0.7 and 0.8 versions. Thanks, Thibaut On Tue, Jan 31, 2012 at 6:59 AM, Ben Coverston ben.covers...@datastax.com wrote: I'm not sure what Carlo is referring to, but generally if you have done, thousands of migrations you can end up in a situation where the migrations take a long time to replay, and there are some race conditions that can be problematic in the case where there are thousands of migrations that may need to be replayed while a node is bootstrapped. If you get into this situation it can be fixed by copying migrations from a known good schema to the node that you are trying to bootstrap. Generally I would advise against frequent schema updates. Unlike rows in column families the schema itself is designed to be relatively static. On Mon, Jan 30, 2012 at 2:14 PM, Jim Newsham jnews...@referentia.com wrote: Could you also elaborate for creating/dropping column families? We're currently working on moving to 1.0 and using dynamically created tables, so I'm very interested in what issues we might encounter. So far the only thing I've encountered (with 1.0.7 + hector 1.0-2) is that dropping a cf may sometimes fail with UnavailableException. I think this happens when the cf is busy being compacted. When I sleep/retry within a loop it eventually succeeds. Thanks, Jim On 1/26/2012 7:32 AM, Pierre-Yves Ritschard wrote: Can you elaborate on the composite types instabilities ? is this specific to hector as the radim's posts suggests ? These one liner answers are quite stressful :) On Thu, Jan 26, 2012 at 1:28 PM, Carlo Pirescarlopi...@gmail.com wrote: If you need to use composite types and create/drop column families on the fly you must be prepared to instabilities. -- Ben Coverston DataStax -- The Apache Cassandra Company I would call 1.0.7 rock fricken solid. Incredibly stable. It has been that way since I updated to 0.8.8 really. TBs of data, billions of requests a day, and thanks to JAMM, memtable type auto-tuning, and other enhancements I rarely, if ever, find a node in a state where it requires a restart. My clusters are beast-ing. There always is bugs in software, but coming from a guy who ran cassandra 0.6.1.Administration on my Cassandra cluster is like a vacation now.