This isn't an HDFS mailing list. On Tue, May 30, 2017 at 2:14 PM daemeon reiydelle <daeme...@gmail.com> wrote:
> no, 3tb is small. 30-50tb of hdfs space is typical these days per hdfs > node. Depends somewhat on whether there is a mix of more and less > frequently accessed data. But even storing only hot data, never saw > anything less than 20tb hdfs per node. > > > > > > *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <(415)%20501-0198>London > (+44) (0) 20 8144 9872 <+44%2020%208144%209872>* > > > *“All men dream, but not equally. Those who dream by night in the dusty > recesses of their minds wake up in the day to find it was vanity, but the > dreamers of the day are dangerous men, for they may act their dreams with > open eyes, to make it possible.” — T.E. Lawrence* > > > On Tue, May 30, 2017 at 2:00 PM, tommaso barbugli <tbarbu...@gmail.com> > wrote: > >> Am I the only one thinking 3TB is way too much data for a single node on >> a VM? >> >> On Tue, May 30, 2017 at 10:36 PM, Daniel Steuernol <dan...@sendwithus.com >> > wrote: >> >>> I don't believe incremental repair is enabled, I have never enabled it >>> on the cluster, and unless it's the default then it is off. Also I don't >>> see a setting in cassandra.yaml for it. >>> >>> >>> >>> On May 30 2017, at 1:10 pm, daemeon reiydelle <daeme...@gmail.com> >>> wrote: >>> >>>> Unless there is a bug, snapshots are excluded (they are not HDFS >>>> anyway!) from nodetool status. >>>> >>>> Out of curiousity, is incremenatal repair enabled? This is almost >>>> certainly a rat hole, but there was an issue a few releases back where load >>>> would only increase until the node was restarted. Had been fixed ages ago, >>>> but wondering what happens if you restart a node, IF you have incremental >>>> enabled. >>>> >>>> >>>> >>>> >>>> >>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <+1%20415-501-0198>London >>>> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>* >>>> >>>> >>>> *“All men dream, but not equally. Those who dream by night in the dusty >>>> recesses of their minds wake up in the day to find it was vanity, but the >>>> dreamers of the day are dangerous men, for they may act their dreams with >>>> open eyes, to make it possible.” — T.E. Lawrence* >>>> >>>> >>>> On Tue, May 30, 2017 at 12:15 PM, Varun Gupta <var...@uber.com> wrote: >>>> >>>> Can you please check if you have incremental backup enabled and >>>> snapshots are occupying the space. >>>> >>>> run nodetool clearsnapshot command. >>>> >>>> On Tue, May 30, 2017 at 11:12 AM, Daniel Steuernol < >>>> dan...@sendwithus.com> wrote: >>>> >>>> It's 3-4TB per node, and by load rises, I'm talking about load as >>>> reported by nodetool status. >>>> >>>> >>>> >>>> On May 30 2017, at 10:25 am, daemeon reiydelle <daeme...@gmail.com> >>>> wrote: >>>> >>>> When you say "the load rises ... ", could you clarify what you mean by >>>> "load"? That has a specific Linux term, and in e.g. Cloudera Manager. But >>>> in neither case would that be relevant to transient or persisted disk. Am I >>>> missing something? >>>> >>>> >>>> On Tue, May 30, 2017 at 10:18 AM, tommaso barbugli <tbarbu...@gmail.com >>>> > wrote: >>>> >>>> 3-4 TB per node or in total? >>>> >>>> On Tue, May 30, 2017 at 6:48 PM, Daniel Steuernol < >>>> dan...@sendwithus.com> wrote: >>>> >>>> I should also mention that I am running cassandra 3.10 on the cluster >>>> >>>> >>>> >>>> On May 29 2017, at 9:43 am, Daniel Steuernol <dan...@sendwithus.com> >>>> wrote: >>>> >>>> The cluster is running with RF=3, right now each node is storing about >>>> 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61 >>>> GB of RAM, and the disks attached for the data drive are gp2 ssd ebs >>>> volumes with 10k iops. I guess this brings up the question of what's a good >>>> marker to decide on whether to increase disk space vs provisioning a new >>>> node? >>>> >>>> >>>> On May 29 2017, at 9:35 am, tommaso barbugli <tbarbu...@gmail.com> >>>> wrote: >>>> >>>> Hi Daniel, >>>> >>>> This is not normal. Possibly a capacity problem. Whats the RF, how much >>>> data do you store per node and what kind of servers do you use (core count, >>>> RAM, disk, ...)? >>>> >>>> Cheers, >>>> Tommaso >>>> >>>> On Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol < >>>> dan...@sendwithus.com> wrote: >>>> >>>> >>>> I am running a 6 node cluster, and I have noticed that the reported >>>> load on each node rises throughout the week and grows way past the actual >>>> disk space used and available on each node. Also eventually latency for >>>> operations suffers and the nodes have to be restarted. A couple questions >>>> on this, is this normal? Also does cassandra need to be restarted every few >>>> days for best performance? Any insight on this behaviour would be helpful. >>>> >>>> Cheers, >>>> Daniel >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For >>>> additional commands, e-mail: user-h...@cassandra.apache.org >>>> >>>> >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For >>>> additional commands, e-mail: user-h...@cassandra.apache.org >>>> >>>> >>>> >>>> >> >