This might not be good news to you. But my experience is that C* 2.X/Windows is not ready for production yet. I've seen various file system related errors. And in one of the JIRAs I was told major work (or rework) is done in 3.X to improve C* stability on Windows.
On Tue, Aug 16, 2016 at 3:44 AM, Bryan Cheng <br...@blockcypher.com> wrote: > Hi Alaa, > > Sounds like you have problems that go beyond Cassandra- likely filesystem > corruption or bad disks. I don't know enough about Windows to give you any > specific advice but I'd try a run of chkdsk to start. > > --Bryan > > On Fri, Aug 12, 2016 at 5:19 PM, Alaa Zubaidi (PDF) <alaa.zuba...@pdf.com> > wrote: > >> Hi Bryan, >> >> Changing disk_failure_policy to best_effort, and running nodetool scrub, >> did not work, it generated another error: >> java.nio.file.AccessDeniedException >> >> Also tried to remove all files (data, commitlog, savedcaches) and restart >> the node fresh, and still I am getting corruption. >> >> and Still nothing that indicate there is a HW issue? >> All other nodes are fine >> >> Regards, >> Alaa >> >> >> On Fri, Aug 12, 2016 at 12:00 PM, Bryan Cheng <br...@blockcypher.com> >> wrote: >> >>> Should also add that if the scope of corruption is _very_ large, and you >>> have a good, aggressive repair policy (read: you are confident in the >>> consistency of the data elsewhere in the cluster), you may just want to >>> decommission and rebuild that node. >>> >>> On Fri, Aug 12, 2016 at 11:55 AM, Bryan Cheng <br...@blockcypher.com> >>> wrote: >>> >>>> Looks like you're doing the offline scrub- have you tried online? >>>> >>>> Here's my typical process for corrupt SSTables. >>>> >>>> With disk_failure_policy set to stop, examine the failing sstables. If >>>> they are very small (in the range of kbs), it is unlikely that there is any >>>> salvageable data there. Just delete them, start the machine, and schedule a >>>> repair ASAP. >>>> >>>> If they are large, then it may be worth salvaging. If the scope of >>>> corruption is reasonable (limited to a few sstables scattered among >>>> different keyspaces), set disk_failure_policy to best_effort, start the >>>> machine up, and run the nodetool scrub. This is online scrub, faster than >>>> offline scrub (at least of 2.1.12, the last time I had to do this). >>>> >>>> Only if all else fails, attempt the very painful offline sstablescrub. >>>> >>>> Is the VMWare client Windows? (Trying to make sure its not just the >>>> host). YMMV but in the past Windows was somewhat of a neglected platform >>>> wrt Cassandra. I think you'd have a lot easier time getting help if running >>>> Linux is an option here. >>>> >>>> >>>> >>>> On Fri, Aug 12, 2016 at 9:16 AM, Alaa Zubaidi (PDF) < >>>> alaa.zuba...@pdf.com> wrote: >>>> >>>>> Hi Jason, >>>>> >>>>> Thanks for your input... >>>>> Thats what I am afraid of? >>>>> Did you find any HW error in the VMware and HW logs? any indication >>>>> that the HW is the reason? I need to make sure that this is the reason >>>>> before asking the customer to spend more money? >>>>> >>>>> Thanks, >>>>> Alaa >>>>> >>>>> On Thu, Aug 11, 2016 at 11:02 PM, Jason Wee <peich...@gmail.com> >>>>> wrote: >>>>> >>>>>> cassandra run on virtual server (vmware)? >>>>>> >>>>>> > I tried sstablescrub but it crashed with hs-err-pid-... >>>>>> maybe try with larger heap allocated to sstablescrub >>>>>> >>>>>> this sstable corrupt i ran into it as well (on cassandra 1.2), first i >>>>>> try nodetool scrub, still persist, then offline sstablescrub still >>>>>> persist, wipe the node and it happen again, then i change the hardware >>>>>> (disk and mem). things went good. >>>>>> >>>>>> hth >>>>>> >>>>>> jason >>>>>> >>>>>> >>>>>> On Fri, Aug 12, 2016 at 9:20 AM, Alaa Zubaidi (PDF) >>>>>> <alaa.zuba...@pdf.com> wrote: >>>>>> > Hi, >>>>>> > >>>>>> > I have a 16 Node cluster, Cassandra 2.2.1 on Windows, local >>>>>> installation >>>>>> > (NOT on the cloud) >>>>>> > >>>>>> > and I am getting >>>>>> > Error [CompactionExecutor:2] 2016-08-12 06:51:52, 983 Cassandra >>>>>> > Daemon.java:183 - Execption in thread Thread[CompactionExecutor:2,1m >>>>>> ain] >>>>>> > org.apache.cassandra.io.FSReaderError: >>>>>> > org.apache.cassandra.io.sstable.CorruptSSTableExecption: >>>>>> > org.apache.cassandra.io.compress.CurrptBlockException: >>>>>> > (E:\........\la-4886-big-Data.db): corruption detected, chunk at >>>>>> 4969092 of >>>>>> > length 10208. >>>>>> > at >>>>>> > org.apache.cassandra.io.util.RandomAccessReader.readBytes(Ra >>>>>> ndomAccessReader.java:357) >>>>>> > ~[apache-cassandra-2.2.1.jar:2.2.1] >>>>>> > .... >>>>>> > .... >>>>>> > ERROR [CompactionExecutor:2] ....... FileUtils.java:463 - Existing >>>>>> > forcefully due to file system exception on startup, disk failure >>>>>> policy >>>>>> > "stop" >>>>>> > >>>>>> > I tried sstablescrub but it crashed with hs-err-pid-... >>>>>> > I removed the corrupted file and started the Node again, after one >>>>>> day the >>>>>> > corruption came back again, I removed the files, and restarted >>>>>> Cassandra, it >>>>>> > worked for few days, then I ran "nodetool repair" after it finished, >>>>>> > Cassandra failed again but with commitlog corruption, after >>>>>> removing the >>>>>> > commitlog files, it failed again with another sstable corruption. >>>>>> > >>>>>> > I was also checking the HW, file system, and memory, the VMware >>>>>> logs showed >>>>>> > no HW error, also the HW management logs showed NO problems or >>>>>> issues. >>>>>> > Also checked the Windows Logs (Application and System) the only >>>>>> thing I >>>>>> > found is on the system logs "Cassandra Service terminated with >>>>>> > service-specific error Cannot create another system semaphore. >>>>>> > >>>>>> > I could not find any thing regarding that error, all comments point >>>>>> to >>>>>> > application log. >>>>>> > >>>>>> > Any help is appreciated.. >>>>>> > >>>>>> > -- >>>>>> > >>>>>> > Alaa Zubaidi >>>>>> > >>>>>> > >>>>>> > This message may contain confidential and privileged information. >>>>>> If it has >>>>>> > been sent to you in error, please reply to advise the sender of the >>>>>> error >>>>>> > and then immediately permanently delete it and all attachments to >>>>>> it from >>>>>> > your systems. If you are not the intended recipient, do not read, >>>>>> copy, >>>>>> > disclose or otherwise use this message or any attachments to it. >>>>>> The sender >>>>>> > disclaims any liability for such unauthorized use. PLEASE NOTE that >>>>>> all >>>>>> > incoming e-mails sent to PDF e-mail accounts will be archived and >>>>>> may be >>>>>> > scanned by us and/or by external service providers to detect and >>>>>> prevent >>>>>> > threats to our systems, investigate illegal or inappropriate >>>>>> behavior, >>>>>> > and/or eliminate unsolicited promotional e-mails (“spam”). If you >>>>>> have any >>>>>> > concerns about this process, please contact us at >>>>>> legal.departm...@pdf.com. >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Alaa Zubaidi >>>>> PDF Solutions, Inc. >>>>> 333 West San Carlos Street, Suite 1000 >>>>> San Jose, CA 95110 USA >>>>> Tel: 408-283-5639 >>>>> fax: 408-938-6479 >>>>> email: alaa.zuba...@pdf.com >>>>> >>>>> >>>>> *This message may contain confidential and privileged information. If >>>>> it has been sent to you in error, please reply to advise the sender of the >>>>> error and then immediately permanently delete it and all attachments to it >>>>> from your systems. If you are not the intended recipient, do not read, >>>>> copy, disclose or otherwise use this message or any attachments to it. The >>>>> sender disclaims any liability for such unauthorized use. PLEASE NOTE that >>>>> all incoming e-mails sent to PDF e-mail accounts will be archived and may >>>>> be scanned by us and/or by external service providers to detect and >>>>> prevent >>>>> threats to our systems, investigate illegal or inappropriate behavior, >>>>> and/or eliminate unsolicited promotional e-mails (“spam”). If you have any >>>>> concerns about this process, please contact us at * >>>>> *legal.departm...@pdf.com* <legal.departm...@pdf.com>*.* >>>>> >>>> >>>> >>> >> >> >> -- >> >> Alaa Zubaidi >> PDF Solutions, Inc. >> 333 West San Carlos Street, Suite 1000 >> San Jose, CA 95110 USA >> Tel: 408-283-5639 >> fax: 408-938-6479 >> email: alaa.zuba...@pdf.com >> >> >> *This message may contain confidential and privileged information. If it >> has been sent to you in error, please reply to advise the sender of the >> error and then immediately permanently delete it and all attachments to it >> from your systems. If you are not the intended recipient, do not read, >> copy, disclose or otherwise use this message or any attachments to it. The >> sender disclaims any liability for such unauthorized use. PLEASE NOTE that >> all incoming e-mails sent to PDF e-mail accounts will be archived and may >> be scanned by us and/or by external service providers to detect and prevent >> threats to our systems, investigate illegal or inappropriate behavior, >> and/or eliminate unsolicited promotional e-mails (“spam”). If you have any >> concerns about this process, please contact us at * >> *legal.departm...@pdf.com* <legal.departm...@pdf.com>*.* >> > >