Re: [EXTERNAL] Re: How bottom of cassandra save data efficiently?

2020-01-02 Thread lampahome
Thank you all.

I found doc in datastax and it said the compression is default to enabled
and set it as LZ4Compressor.


Re: repair failed

2020-01-02 Thread Ben Mills
Hi Oliver,

I don't have a quick answer (or any answer yet), though we ran into a
similar issue and I'm wondering about your environment and some configs.

- Operating system?
- Cloud or on-premise?
- Version of Cassandra?
- Version of Java?
- Compaction strategy?
- Primarily read or primarily write (or a blend of both)?
- How much memory allocated to heap?
- How long do all the repair commands typically take per node?

nodetool repair -full -dcpar will stream data across data centers - is it
possible that the number of nodes, or the amount of data, or the number of
keyspaces has grown enough over time to cause streaming issues (and
timeouts)?

You wrote:

Is it problematic if the repair is started only on one node?

Are you asking whether it's ok to run -full repairs one node at a time (on
all nodes)? Or are you saying that you are only repairing one node in each
cluster or DC?

Thanks,
Ben




On Sun, Dec 29, 2019 at 3:54 AM gloCalHelp.com 
wrote:

> TO Oliver :
>Maybe repair should be executed after all data in MEMTBL are all
> flushed into harddisk?
>
>
> Sincerely yours,
> Georgelin
> www_8ems_...@sina.com
> mobile:0086 180 5986 1565
>
>
> - 原始邮件 -
> 发件人:Oliver Herrmann 
> 收件人:user@cassandra.apache.org
> 主题:repair failed
> 日期:2019年12月28日 23点15分
>
> Hello,
>
> today the second time our weekly repair job failed which was working for
> many month without a problem. We are having multiple Cassandra nodes in two
> data center.
>
> The repair command is started only on one node with the following
> parameters:
>
> nodetool repair -full -dcpar
>
> Is it problematic if the repair is started only on one node?
>
> The repair fails after one hour with the following error message:
>
>  failed with error Could not create snapshot at /192.168.13.232
> (progress: 0%)
> [2019-12-28 05:00:04,295] Some repair failed
> [2019-12-28 05:00:04,296] Repair command #1 finished in 1 hour 0 minutes 2
> seconds
> error: Repair job has failed with the error message: [2019-12-28
> 05:00:04,295] Some repair failed
> -- StackTrace --
> java.lang.RuntimeException: Repair job has failed with the error message:
> [2019-12-28 05:00:04,295] Some repair failed
> at
> org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:116)
> at
> org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)
> at
> com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.dispatchNotification(Unknown
> Source)
> at
> com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.doRun(Unknown
> Source)
> at
> com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.run(Unknown
> Source)
> at
> com.sun.jmx.remote.internal.ClientNotifForwarder$LinearExecutor$1.run(Unknown
> Source)
>
> In the logfile on 192.168.13.232 which is in the second data center I
> could find only in debug.log the following log messages:
> DEBUG [COMMIT-LOG-ALLOCATOR] 2019-12-28 04:21:20,143
> AbstractCommitLogSegmentManager.java:109 - No segments in reserve; creating
> a fresh one
> DEBUG [MessagingService-Outgoing-192.168.13.120-Small] 2019-12-28
> 04:31:00,450 OutboundTcpConnection.java:410 - Socket to 192.168.13.120
>  closed
> DEBUG [MessagingService-Outgoing-192.168.13.120-Small] 2019-12-28
> 04:31:00,450 OutboundTcpConnection.java:349 - Error writing to 192.168
> .13.120
> java.io.IOException: Connection timed out
> at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
> ~[na:1.8.0_111]
>
> We tried to run repair a few more times but it always failed with the same
> error. After restarting all nodes it was finally successful.
>
> Any idea what could be wrong?
>
> Regards
> Oliver
>


RE: [EXTERNAL] Re: How bottom of cassandra save data efficiently?

2020-01-02 Thread Durity, Sean R
100,000 rows is pretty small. Import your data to your cluster, do a nodetool 
flush on each node, then you can see how much disk space is actually used.

There are different compression tools available to you when you create the 
table. It also matters if the rows are in separate partitions or you have many 
rows per partition. In one exercise I have done, individual partitions can 
cause the data to expand from 0.3 MB (with many rows per partition) to 20 MB 
(one row per partition) – all from the same data set. Your compaction settings 
can also change the size of data on disk.

Bottom line – precise math requires more parameters than you have given. Actual 
experimentation is easier.


Sean Durity

From: lampahome 
Sent: Wednesday, January 1, 2020 8:33 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: How bottom of cassandra save data efficiently?



Dipan Shah mailto:dipan@hotmail.com>> 於 2019年12月31日 
週二 下午5:34寫道:
Hello lampahome,

Data will be compressed but you will also have to account for the replication 
factor that you will be using.


Thanks,

Dipan Shah


The key factor about efficiency is replication factor. Are there other factors?




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


Re: How bottom of cassandra save data efficiently?

2020-01-02 Thread Reid Pinchback
As others pointed out, compression will reduce the size and replication will 
(across nodes) increase the total size.

The other thing to note is that you can have multiple versions of the data in 
different sstables, and tombstones related to deletions and TTLs, and indexes, 
and any snapshots, and room for the temporary artifacts of compactions.   If 
you are just trying to have a quick guestimate of your space needs, I’d 
probably use your uncompressed calculation as a heuristic for the per-node 
storage required.

From: lampahome 
Reply-To: "user@cassandra.apache.org" 
Date: Monday, December 30, 2019 at 9:37 PM
To: "user@cassandra.apache.org" 
Subject: How bottom of cassandra save data efficiently?

Message from External Sender
If I use var a as primary key and var b as second key, and a and b are 16 bytes 
and 8 bytes.

And other data are 32 bytes.

In one row, I have a+b+data = 16+8+32 = 56 bytes.

If I have 100,000 rows to store in cassandra, will it occupy space 56x10 
bytes in my disk? Or data will be compressed?

thx


RE: [EXTERNAL] Re: Facing issues while starting Cassandra

2020-01-02 Thread Durity, Sean R
Any read-only file systems? Have you tried to start from the command line 
(instead of a service)? Sometimes that will give a more helpful error when 
start-up can’t complete.

If your error is literally what you included, it looks like the executable 
can’t find the cassandra.yaml file.

I will agree with Jeff, though. When I have seen a similar error it has usually 
been a yaml violation, such as having a tab (instead of spaces) in the yaml 
file. Check that specific node’s file with a yaml lint detector?

Sean Durity

From: Inquistive allen 
Sent: Tuesday, December 24, 2019 2:01 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Facing issues while starting Cassandra

Hello Osman,

Thanks for the suggestion.
I did try "export LC_ALL=C"
It didn't help.

Thanks

On Tue, 24 Dec, 2019, 12:05 PM Osman Yozgatlıoğlu, 
mailto:osman.yozgatlio...@gmail.com>> wrote:
I faced similar issues with different locale settings.
Could you try following command before running?
export LC_ALL=C;

Regards,
Osman

On Tue, 24 Dec 2019 at 09:01, Inquistive allen 
mailto:inquial...@gmail.com>> wrote:
>
> Hello Jeff,
>
> Thanks for responding.
> I have validated the cassandra.yaml file with other hosts in the cluster.
> There is no difference. I copied a yaml file from other node to this node and 
> changed the required configs. Still facing the same issue.
> The server went down for patching and after coming back up, Cassandra dosent 
> seem to start.
> Having looked for solutions on google, I found that it might be a problem 
> with the /tmp directory where the classes are stored.
> Each time I try starting Cassandra, in the /tmp directory a new directory is 
> created, but nothing is inside the directory. After some time, the node goes 
> down.
>
> I believe there is something to do with the /tmp directory.
> Request you to comment on the same.
>
> Thanks
>
> On Tue, 24 Dec, 2019, 3:42 AM Jeff Jirsa, 
> mailto:jji...@gmail.com>> wrote:
>>
>> Are you able to share the yaml? Almost certainly something in it that’s 
>> invalid.
>>
>> On Dec 23, 2019, at 12:51 PM, Inquistive allen 
>> mailto:inquial...@gmail.com>> wrote:
>>
>> 
>> Hello Team,
>>
>> I am facing issues while starting Cassandra.
>>
>> Caused by: org.apache.cassandra.exceptions.ConfigurationException : Invalid 
>> yaml: file: /path/to/yaml
>> Error: null ; can't construct a java object for tag: yaml.org 
>> [yaml.org],2002:org.apache.cassandra.config.Config;
>>  exception= java.lang.reflect.InvocationTargetException
>>
>> Request to comment on how to resolve the issue.
>>
>> Thanks & Regards
>> Allen

-
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: 
user-h...@cassandra.apache.org



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.