Re: Corrupt SSTABLE over and over

2016-08-17 Thread Kai Wang
This might not be good news to you. But my experience is that C*
2.X/Windows is not ready for production yet. I've seen various file system
related errors. And in one of the JIRAs I was told major work (or rework)
is done in 3.X to improve C* stability on Windows.

On Tue, Aug 16, 2016 at 3:44 AM, Bryan Cheng  wrote:

> Hi Alaa,
>
> Sounds like you have problems that go beyond Cassandra- likely filesystem
> corruption or bad disks. I don't know enough about Windows to give you any
> specific advice but I'd try a run of chkdsk to start.
>
> --Bryan
>
> On Fri, Aug 12, 2016 at 5:19 PM, Alaa Zubaidi (PDF) 
> wrote:
>
>> Hi Bryan,
>>
>> Changing disk_failure_policy to best_effort, and running nodetool scrub,
>> did not work, it generated another error:
>> java.nio.file.AccessDeniedException
>>
>> Also tried to remove all files (data, commitlog, savedcaches) and restart
>> the node fresh, and still I am getting corruption.
>>
>> and Still nothing that indicate there is a HW issue?
>> All other nodes are fine
>>
>> Regards,
>> Alaa
>>
>>
>> On Fri, Aug 12, 2016 at 12:00 PM, Bryan Cheng 
>> wrote:
>>
>>> Should also add that if the scope of corruption is _very_ large, and you
>>> have a good, aggressive repair policy (read: you are confident in the
>>> consistency of the data elsewhere in the cluster), you may just want to
>>> decommission and rebuild that node.
>>>
>>> On Fri, Aug 12, 2016 at 11:55 AM, Bryan Cheng 
>>> wrote:
>>>
 Looks like you're doing the offline scrub- have you tried online?

 Here's my typical process for corrupt SSTables.

 With disk_failure_policy set to stop, examine the failing sstables. If
 they are very small (in the range of kbs), it is unlikely that there is any
 salvageable data there. Just delete them, start the machine, and schedule a
 repair ASAP.

 If they are large, then it may be worth salvaging. If the scope of
 corruption is reasonable (limited to a few sstables scattered among
 different keyspaces), set disk_failure_policy to best_effort, start the
 machine up, and run the nodetool scrub. This is online scrub, faster than
 offline scrub (at least of 2.1.12, the last time I had to do this).

 Only if all else fails, attempt the very painful offline sstablescrub.

 Is the VMWare client Windows? (Trying to make sure its not just the
 host). YMMV but in the past Windows was somewhat of a neglected platform
 wrt Cassandra. I think you'd have a lot easier time getting help if running
 Linux is an option here.



 On Fri, Aug 12, 2016 at 9:16 AM, Alaa Zubaidi (PDF) <
 alaa.zuba...@pdf.com> wrote:

> Hi Jason,
>
> Thanks for your input...
> Thats what I am afraid of?
> Did you find any HW error in the VMware and HW logs? any indication
> that the HW is the reason? I need to make sure that this is the reason
> before asking the customer to spend more money?
>
> Thanks,
> Alaa
>
> On Thu, Aug 11, 2016 at 11:02 PM, Jason Wee 
> wrote:
>
>> cassandra run on virtual server (vmware)?
>>
>> > I tried sstablescrub but it crashed with hs-err-pid-...
>> maybe try with larger heap allocated to sstablescrub
>>
>> this sstable corrupt i ran into it as well (on cassandra 1.2), first i
>> try nodetool scrub, still persist, then offline sstablescrub still
>> persist, wipe the node and it happen again, then i change the hardware
>> (disk and mem). things went good.
>>
>> hth
>>
>> jason
>>
>>
>> On Fri, Aug 12, 2016 at 9:20 AM, Alaa Zubaidi (PDF)
>>  wrote:
>> > Hi,
>> >
>> > I have a 16 Node cluster, Cassandra 2.2.1 on Windows, local
>> installation
>> > (NOT on the cloud)
>> >
>> > and I am getting
>> > Error [CompactionExecutor:2] 2016-08-12 06:51:52, 983 Cassandra
>> > Daemon.java:183 - Execption in thread Thread[CompactionExecutor:2,1m
>> ain]
>> > org.apache.cassandra.io.FSReaderError:
>> > org.apache.cassandra.io.sstable.CorruptSSTableExecption:
>> > org.apache.cassandra.io.compress.CurrptBlockException:
>> > (E:\\la-4886-big-Data.db): corruption detected, chunk at
>> 4969092 of
>> > length 10208.
>> > at
>> > org.apache.cassandra.io.util.RandomAccessReader.readBytes(Ra
>> ndomAccessReader.java:357)
>> > ~[apache-cassandra-2.2.1.jar:2.2.1]
>> > 
>> > 
>> > ERROR [CompactionExecutor:2] ... FileUtils.java:463 - Existing
>> > forcefully due to file system exception on startup, disk failure
>> policy
>> > "stop"
>> >
>> > I tried sstablescrub but it crashed with hs-err-pid-...
>> > I removed the corrupted file and started the Node again, after one
>> day the
>> > corruption came back again, 

Re: Corrupt SSTABLE over and over

2016-08-15 Thread Bryan Cheng
Hi Alaa,

Sounds like you have problems that go beyond Cassandra- likely filesystem
corruption or bad disks. I don't know enough about Windows to give you any
specific advice but I'd try a run of chkdsk to start.

--Bryan

On Fri, Aug 12, 2016 at 5:19 PM, Alaa Zubaidi (PDF) 
wrote:

> Hi Bryan,
>
> Changing disk_failure_policy to best_effort, and running nodetool scrub,
> did not work, it generated another error:
> java.nio.file.AccessDeniedException
>
> Also tried to remove all files (data, commitlog, savedcaches) and restart
> the node fresh, and still I am getting corruption.
>
> and Still nothing that indicate there is a HW issue?
> All other nodes are fine
>
> Regards,
> Alaa
>
>
> On Fri, Aug 12, 2016 at 12:00 PM, Bryan Cheng 
> wrote:
>
>> Should also add that if the scope of corruption is _very_ large, and you
>> have a good, aggressive repair policy (read: you are confident in the
>> consistency of the data elsewhere in the cluster), you may just want to
>> decommission and rebuild that node.
>>
>> On Fri, Aug 12, 2016 at 11:55 AM, Bryan Cheng 
>> wrote:
>>
>>> Looks like you're doing the offline scrub- have you tried online?
>>>
>>> Here's my typical process for corrupt SSTables.
>>>
>>> With disk_failure_policy set to stop, examine the failing sstables. If
>>> they are very small (in the range of kbs), it is unlikely that there is any
>>> salvageable data there. Just delete them, start the machine, and schedule a
>>> repair ASAP.
>>>
>>> If they are large, then it may be worth salvaging. If the scope of
>>> corruption is reasonable (limited to a few sstables scattered among
>>> different keyspaces), set disk_failure_policy to best_effort, start the
>>> machine up, and run the nodetool scrub. This is online scrub, faster than
>>> offline scrub (at least of 2.1.12, the last time I had to do this).
>>>
>>> Only if all else fails, attempt the very painful offline sstablescrub.
>>>
>>> Is the VMWare client Windows? (Trying to make sure its not just the
>>> host). YMMV but in the past Windows was somewhat of a neglected platform
>>> wrt Cassandra. I think you'd have a lot easier time getting help if running
>>> Linux is an option here.
>>>
>>>
>>>
>>> On Fri, Aug 12, 2016 at 9:16 AM, Alaa Zubaidi (PDF) <
>>> alaa.zuba...@pdf.com> wrote:
>>>
 Hi Jason,

 Thanks for your input...
 Thats what I am afraid of?
 Did you find any HW error in the VMware and HW logs? any indication
 that the HW is the reason? I need to make sure that this is the reason
 before asking the customer to spend more money?

 Thanks,
 Alaa

 On Thu, Aug 11, 2016 at 11:02 PM, Jason Wee  wrote:

> cassandra run on virtual server (vmware)?
>
> > I tried sstablescrub but it crashed with hs-err-pid-...
> maybe try with larger heap allocated to sstablescrub
>
> this sstable corrupt i ran into it as well (on cassandra 1.2), first i
> try nodetool scrub, still persist, then offline sstablescrub still
> persist, wipe the node and it happen again, then i change the hardware
> (disk and mem). things went good.
>
> hth
>
> jason
>
>
> On Fri, Aug 12, 2016 at 9:20 AM, Alaa Zubaidi (PDF)
>  wrote:
> > Hi,
> >
> > I have a 16 Node cluster, Cassandra 2.2.1 on Windows, local
> installation
> > (NOT on the cloud)
> >
> > and I am getting
> > Error [CompactionExecutor:2] 2016-08-12 06:51:52, 983 Cassandra
> > Daemon.java:183 - Execption in thread Thread[CompactionExecutor:2,1m
> ain]
> > org.apache.cassandra.io.FSReaderError:
> > org.apache.cassandra.io.sstable.CorruptSSTableExecption:
> > org.apache.cassandra.io.compress.CurrptBlockException:
> > (E:\\la-4886-big-Data.db): corruption detected, chunk at
> 4969092 of
> > length 10208.
> > at
> > org.apache.cassandra.io.util.RandomAccessReader.readBytes(Ra
> ndomAccessReader.java:357)
> > ~[apache-cassandra-2.2.1.jar:2.2.1]
> > 
> > 
> > ERROR [CompactionExecutor:2] ... FileUtils.java:463 - Existing
> > forcefully due to file system exception on startup, disk failure
> policy
> > "stop"
> >
> > I tried sstablescrub but it crashed with hs-err-pid-...
> > I removed the corrupted file and started the Node again, after one
> day the
> > corruption came back again, I removed the files, and restarted
> Cassandra, it
> > worked for few days, then I ran "nodetool repair" after it finished,
> > Cassandra failed again but with commitlog corruption, after removing
> the
> > commitlog files, it failed again with another sstable corruption.
> >
> > I was also checking the HW, file system, and memory, the VMware logs
> showed
> > no HW error, also the HW management logs showed NO problems or
> issues.
> 

Re: Corrupt SSTABLE over and over

2016-08-12 Thread Alaa Zubaidi (PDF)
Hi Bryan,

Changing disk_failure_policy to best_effort, and running nodetool scrub,
did not work, it generated another error:
java.nio.file.AccessDeniedException

Also tried to remove all files (data, commitlog, savedcaches) and restart
the node fresh, and still I am getting corruption.

and Still nothing that indicate there is a HW issue?
All other nodes are fine

Regards,
Alaa


On Fri, Aug 12, 2016 at 12:00 PM, Bryan Cheng  wrote:

> Should also add that if the scope of corruption is _very_ large, and you
> have a good, aggressive repair policy (read: you are confident in the
> consistency of the data elsewhere in the cluster), you may just want to
> decommission and rebuild that node.
>
> On Fri, Aug 12, 2016 at 11:55 AM, Bryan Cheng 
> wrote:
>
>> Looks like you're doing the offline scrub- have you tried online?
>>
>> Here's my typical process for corrupt SSTables.
>>
>> With disk_failure_policy set to stop, examine the failing sstables. If
>> they are very small (in the range of kbs), it is unlikely that there is any
>> salvageable data there. Just delete them, start the machine, and schedule a
>> repair ASAP.
>>
>> If they are large, then it may be worth salvaging. If the scope of
>> corruption is reasonable (limited to a few sstables scattered among
>> different keyspaces), set disk_failure_policy to best_effort, start the
>> machine up, and run the nodetool scrub. This is online scrub, faster than
>> offline scrub (at least of 2.1.12, the last time I had to do this).
>>
>> Only if all else fails, attempt the very painful offline sstablescrub.
>>
>> Is the VMWare client Windows? (Trying to make sure its not just the
>> host). YMMV but in the past Windows was somewhat of a neglected platform
>> wrt Cassandra. I think you'd have a lot easier time getting help if running
>> Linux is an option here.
>>
>>
>>
>> On Fri, Aug 12, 2016 at 9:16 AM, Alaa Zubaidi (PDF) > > wrote:
>>
>>> Hi Jason,
>>>
>>> Thanks for your input...
>>> Thats what I am afraid of?
>>> Did you find any HW error in the VMware and HW logs? any indication that
>>> the HW is the reason? I need to make sure that this is the reason before
>>> asking the customer to spend more money?
>>>
>>> Thanks,
>>> Alaa
>>>
>>> On Thu, Aug 11, 2016 at 11:02 PM, Jason Wee  wrote:
>>>
 cassandra run on virtual server (vmware)?

 > I tried sstablescrub but it crashed with hs-err-pid-...
 maybe try with larger heap allocated to sstablescrub

 this sstable corrupt i ran into it as well (on cassandra 1.2), first i
 try nodetool scrub, still persist, then offline sstablescrub still
 persist, wipe the node and it happen again, then i change the hardware
 (disk and mem). things went good.

 hth

 jason


 On Fri, Aug 12, 2016 at 9:20 AM, Alaa Zubaidi (PDF)
  wrote:
 > Hi,
 >
 > I have a 16 Node cluster, Cassandra 2.2.1 on Windows, local
 installation
 > (NOT on the cloud)
 >
 > and I am getting
 > Error [CompactionExecutor:2] 2016-08-12 06:51:52, 983 Cassandra
 > Daemon.java:183 - Execption in thread Thread[CompactionExecutor:2,1m
 ain]
 > org.apache.cassandra.io.FSReaderError:
 > org.apache.cassandra.io.sstable.CorruptSSTableExecption:
 > org.apache.cassandra.io.compress.CurrptBlockException:
 > (E:\\la-4886-big-Data.db): corruption detected, chunk at
 4969092 of
 > length 10208.
 > at
 > org.apache.cassandra.io.util.RandomAccessReader.readBytes(Ra
 ndomAccessReader.java:357)
 > ~[apache-cassandra-2.2.1.jar:2.2.1]
 > 
 > 
 > ERROR [CompactionExecutor:2] ... FileUtils.java:463 - Existing
 > forcefully due to file system exception on startup, disk failure
 policy
 > "stop"
 >
 > I tried sstablescrub but it crashed with hs-err-pid-...
 > I removed the corrupted file and started the Node again, after one
 day the
 > corruption came back again, I removed the files, and restarted
 Cassandra, it
 > worked for few days, then I ran "nodetool repair" after it finished,
 > Cassandra failed again but with commitlog corruption, after removing
 the
 > commitlog files, it failed again with another sstable corruption.
 >
 > I was also checking the HW, file system, and memory, the VMware logs
 showed
 > no HW error, also the HW management logs showed NO problems or issues.
 > Also checked the Windows Logs (Application and System) the only thing
 I
 > found is on the system logs "Cassandra Service terminated with
 > service-specific error Cannot create another system semaphore.
 >
 > I could not find any thing regarding that error, all comments point to
 > application log.
 >
 > Any help is appreciated..
 >
 > --
 >
 > Alaa Zubaidi
 >
 >
 > This message 

Re: Corrupt SSTABLE over and over

2016-08-12 Thread Bryan Cheng
Should also add that if the scope of corruption is _very_ large, and you
have a good, aggressive repair policy (read: you are confident in the
consistency of the data elsewhere in the cluster), you may just want to
decommission and rebuild that node.

On Fri, Aug 12, 2016 at 11:55 AM, Bryan Cheng  wrote:

> Looks like you're doing the offline scrub- have you tried online?
>
> Here's my typical process for corrupt SSTables.
>
> With disk_failure_policy set to stop, examine the failing sstables. If
> they are very small (in the range of kbs), it is unlikely that there is any
> salvageable data there. Just delete them, start the machine, and schedule a
> repair ASAP.
>
> If they are large, then it may be worth salvaging. If the scope of
> corruption is reasonable (limited to a few sstables scattered among
> different keyspaces), set disk_failure_policy to best_effort, start the
> machine up, and run the nodetool scrub. This is online scrub, faster than
> offline scrub (at least of 2.1.12, the last time I had to do this).
>
> Only if all else fails, attempt the very painful offline sstablescrub.
>
> Is the VMWare client Windows? (Trying to make sure its not just the host).
> YMMV but in the past Windows was somewhat of a neglected platform wrt
> Cassandra. I think you'd have a lot easier time getting help if running
> Linux is an option here.
>
>
>
> On Fri, Aug 12, 2016 at 9:16 AM, Alaa Zubaidi (PDF) 
> wrote:
>
>> Hi Jason,
>>
>> Thanks for your input...
>> Thats what I am afraid of?
>> Did you find any HW error in the VMware and HW logs? any indication that
>> the HW is the reason? I need to make sure that this is the reason before
>> asking the customer to spend more money?
>>
>> Thanks,
>> Alaa
>>
>> On Thu, Aug 11, 2016 at 11:02 PM, Jason Wee  wrote:
>>
>>> cassandra run on virtual server (vmware)?
>>>
>>> > I tried sstablescrub but it crashed with hs-err-pid-...
>>> maybe try with larger heap allocated to sstablescrub
>>>
>>> this sstable corrupt i ran into it as well (on cassandra 1.2), first i
>>> try nodetool scrub, still persist, then offline sstablescrub still
>>> persist, wipe the node and it happen again, then i change the hardware
>>> (disk and mem). things went good.
>>>
>>> hth
>>>
>>> jason
>>>
>>>
>>> On Fri, Aug 12, 2016 at 9:20 AM, Alaa Zubaidi (PDF)
>>>  wrote:
>>> > Hi,
>>> >
>>> > I have a 16 Node cluster, Cassandra 2.2.1 on Windows, local
>>> installation
>>> > (NOT on the cloud)
>>> >
>>> > and I am getting
>>> > Error [CompactionExecutor:2] 2016-08-12 06:51:52, 983 Cassandra
>>> > Daemon.java:183 - Execption in thread Thread[CompactionExecutor:2,1m
>>> ain]
>>> > org.apache.cassandra.io.FSReaderError:
>>> > org.apache.cassandra.io.sstable.CorruptSSTableExecption:
>>> > org.apache.cassandra.io.compress.CurrptBlockException:
>>> > (E:\\la-4886-big-Data.db): corruption detected, chunk at
>>> 4969092 of
>>> > length 10208.
>>> > at
>>> > org.apache.cassandra.io.util.RandomAccessReader.readBytes(Ra
>>> ndomAccessReader.java:357)
>>> > ~[apache-cassandra-2.2.1.jar:2.2.1]
>>> > 
>>> > 
>>> > ERROR [CompactionExecutor:2] ... FileUtils.java:463 - Existing
>>> > forcefully due to file system exception on startup, disk failure policy
>>> > "stop"
>>> >
>>> > I tried sstablescrub but it crashed with hs-err-pid-...
>>> > I removed the corrupted file and started the Node again, after one day
>>> the
>>> > corruption came back again, I removed the files, and restarted
>>> Cassandra, it
>>> > worked for few days, then I ran "nodetool repair" after it finished,
>>> > Cassandra failed again but with commitlog corruption, after removing
>>> the
>>> > commitlog files, it failed again with another sstable corruption.
>>> >
>>> > I was also checking the HW, file system, and memory, the VMware logs
>>> showed
>>> > no HW error, also the HW management logs showed NO problems or issues.
>>> > Also checked the Windows Logs (Application and System) the only thing I
>>> > found is on the system logs "Cassandra Service terminated with
>>> > service-specific error Cannot create another system semaphore.
>>> >
>>> > I could not find any thing regarding that error, all comments point to
>>> > application log.
>>> >
>>> > Any help is appreciated..
>>> >
>>> > --
>>> >
>>> > Alaa Zubaidi
>>> >
>>> >
>>> > This message may contain confidential and privileged information. If
>>> it has
>>> > been sent to you in error, please reply to advise the sender of the
>>> error
>>> > and then immediately permanently delete it and all attachments to it
>>> from
>>> > your systems. If you are not the intended recipient, do not read, copy,
>>> > disclose or otherwise use this message or any attachments to it. The
>>> sender
>>> > disclaims any liability for such unauthorized use. PLEASE NOTE that all
>>> > incoming e-mails sent to PDF e-mail accounts will be archived and may
>>> be
>>> > scanned by us and/or by external 

Re: Corrupt SSTABLE over and over

2016-08-12 Thread Bryan Cheng
Looks like you're doing the offline scrub- have you tried online?

Here's my typical process for corrupt SSTables.

With disk_failure_policy set to stop, examine the failing sstables. If they
are very small (in the range of kbs), it is unlikely that there is any
salvageable data there. Just delete them, start the machine, and schedule a
repair ASAP.

If they are large, then it may be worth salvaging. If the scope of
corruption is reasonable (limited to a few sstables scattered among
different keyspaces), set disk_failure_policy to best_effort, start the
machine up, and run the nodetool scrub. This is online scrub, faster than
offline scrub (at least of 2.1.12, the last time I had to do this).

Only if all else fails, attempt the very painful offline sstablescrub.

Is the VMWare client Windows? (Trying to make sure its not just the host).
YMMV but in the past Windows was somewhat of a neglected platform wrt
Cassandra. I think you'd have a lot easier time getting help if running
Linux is an option here.



On Fri, Aug 12, 2016 at 9:16 AM, Alaa Zubaidi (PDF) 
wrote:

> Hi Jason,
>
> Thanks for your input...
> Thats what I am afraid of?
> Did you find any HW error in the VMware and HW logs? any indication that
> the HW is the reason? I need to make sure that this is the reason before
> asking the customer to spend more money?
>
> Thanks,
> Alaa
>
> On Thu, Aug 11, 2016 at 11:02 PM, Jason Wee  wrote:
>
>> cassandra run on virtual server (vmware)?
>>
>> > I tried sstablescrub but it crashed with hs-err-pid-...
>> maybe try with larger heap allocated to sstablescrub
>>
>> this sstable corrupt i ran into it as well (on cassandra 1.2), first i
>> try nodetool scrub, still persist, then offline sstablescrub still
>> persist, wipe the node and it happen again, then i change the hardware
>> (disk and mem). things went good.
>>
>> hth
>>
>> jason
>>
>>
>> On Fri, Aug 12, 2016 at 9:20 AM, Alaa Zubaidi (PDF)
>>  wrote:
>> > Hi,
>> >
>> > I have a 16 Node cluster, Cassandra 2.2.1 on Windows, local installation
>> > (NOT on the cloud)
>> >
>> > and I am getting
>> > Error [CompactionExecutor:2] 2016-08-12 06:51:52, 983 Cassandra
>> > Daemon.java:183 - Execption in thread Thread[CompactionExecutor:2,1m
>> ain]
>> > org.apache.cassandra.io.FSReaderError:
>> > org.apache.cassandra.io.sstable.CorruptSSTableExecption:
>> > org.apache.cassandra.io.compress.CurrptBlockException:
>> > (E:\\la-4886-big-Data.db): corruption detected, chunk at
>> 4969092 of
>> > length 10208.
>> > at
>> > org.apache.cassandra.io.util.RandomAccessReader.readBytes(Ra
>> ndomAccessReader.java:357)
>> > ~[apache-cassandra-2.2.1.jar:2.2.1]
>> > 
>> > 
>> > ERROR [CompactionExecutor:2] ... FileUtils.java:463 - Existing
>> > forcefully due to file system exception on startup, disk failure policy
>> > "stop"
>> >
>> > I tried sstablescrub but it crashed with hs-err-pid-...
>> > I removed the corrupted file and started the Node again, after one day
>> the
>> > corruption came back again, I removed the files, and restarted
>> Cassandra, it
>> > worked for few days, then I ran "nodetool repair" after it finished,
>> > Cassandra failed again but with commitlog corruption, after removing the
>> > commitlog files, it failed again with another sstable corruption.
>> >
>> > I was also checking the HW, file system, and memory, the VMware logs
>> showed
>> > no HW error, also the HW management logs showed NO problems or issues.
>> > Also checked the Windows Logs (Application and System) the only thing I
>> > found is on the system logs "Cassandra Service terminated with
>> > service-specific error Cannot create another system semaphore.
>> >
>> > I could not find any thing regarding that error, all comments point to
>> > application log.
>> >
>> > Any help is appreciated..
>> >
>> > --
>> >
>> > Alaa Zubaidi
>> >
>> >
>> > This message may contain confidential and privileged information. If it
>> has
>> > been sent to you in error, please reply to advise the sender of the
>> error
>> > and then immediately permanently delete it and all attachments to it
>> from
>> > your systems. If you are not the intended recipient, do not read, copy,
>> > disclose or otherwise use this message or any attachments to it. The
>> sender
>> > disclaims any liability for such unauthorized use. PLEASE NOTE that all
>> > incoming e-mails sent to PDF e-mail accounts will be archived and may be
>> > scanned by us and/or by external service providers to detect and prevent
>> > threats to our systems, investigate illegal or inappropriate behavior,
>> > and/or eliminate unsolicited promotional e-mails (“spam”). If you have
>> any
>> > concerns about this process, please contact us at
>> legal.departm...@pdf.com.
>>
>
>
>
> --
>
> Alaa Zubaidi
> PDF Solutions, Inc.
> 333 West San Carlos Street, Suite 1000
> San Jose, CA 95110  USA
> Tel: 408-283-5639
> fax: 408-938-6479
> email: alaa.zuba...@pdf.com
>
>
> 

Re: Corrupt SSTABLE over and over

2016-08-12 Thread Alaa Zubaidi (PDF)
Hi Jason,

Thanks for your input...
Thats what I am afraid of?
Did you find any HW error in the VMware and HW logs? any indication that
the HW is the reason? I need to make sure that this is the reason before
asking the customer to spend more money?

Thanks,
Alaa

On Thu, Aug 11, 2016 at 11:02 PM, Jason Wee  wrote:

> cassandra run on virtual server (vmware)?
>
> > I tried sstablescrub but it crashed with hs-err-pid-...
> maybe try with larger heap allocated to sstablescrub
>
> this sstable corrupt i ran into it as well (on cassandra 1.2), first i
> try nodetool scrub, still persist, then offline sstablescrub still
> persist, wipe the node and it happen again, then i change the hardware
> (disk and mem). things went good.
>
> hth
>
> jason
>
>
> On Fri, Aug 12, 2016 at 9:20 AM, Alaa Zubaidi (PDF)
>  wrote:
> > Hi,
> >
> > I have a 16 Node cluster, Cassandra 2.2.1 on Windows, local installation
> > (NOT on the cloud)
> >
> > and I am getting
> > Error [CompactionExecutor:2] 2016-08-12 06:51:52, 983 Cassandra
> > Daemon.java:183 - Execption in thread Thread[CompactionExecutor:2,1main]
> > org.apache.cassandra.io.FSReaderError:
> > org.apache.cassandra.io.sstable.CorruptSSTableExecption:
> > org.apache.cassandra.io.compress.CurrptBlockException:
> > (E:\\la-4886-big-Data.db): corruption detected, chunk at
> 4969092 of
> > length 10208.
> > at
> > org.apache.cassandra.io.util.RandomAccessReader.readBytes(
> RandomAccessReader.java:357)
> > ~[apache-cassandra-2.2.1.jar:2.2.1]
> > 
> > 
> > ERROR [CompactionExecutor:2] ... FileUtils.java:463 - Existing
> > forcefully due to file system exception on startup, disk failure policy
> > "stop"
> >
> > I tried sstablescrub but it crashed with hs-err-pid-...
> > I removed the corrupted file and started the Node again, after one day
> the
> > corruption came back again, I removed the files, and restarted
> Cassandra, it
> > worked for few days, then I ran "nodetool repair" after it finished,
> > Cassandra failed again but with commitlog corruption, after removing the
> > commitlog files, it failed again with another sstable corruption.
> >
> > I was also checking the HW, file system, and memory, the VMware logs
> showed
> > no HW error, also the HW management logs showed NO problems or issues.
> > Also checked the Windows Logs (Application and System) the only thing I
> > found is on the system logs "Cassandra Service terminated with
> > service-specific error Cannot create another system semaphore.
> >
> > I could not find any thing regarding that error, all comments point to
> > application log.
> >
> > Any help is appreciated..
> >
> > --
> >
> > Alaa Zubaidi
> >
> >
> > This message may contain confidential and privileged information. If it
> has
> > been sent to you in error, please reply to advise the sender of the error
> > and then immediately permanently delete it and all attachments to it from
> > your systems. If you are not the intended recipient, do not read, copy,
> > disclose or otherwise use this message or any attachments to it. The
> sender
> > disclaims any liability for such unauthorized use. PLEASE NOTE that all
> > incoming e-mails sent to PDF e-mail accounts will be archived and may be
> > scanned by us and/or by external service providers to detect and prevent
> > threats to our systems, investigate illegal or inappropriate behavior,
> > and/or eliminate unsolicited promotional e-mails (“spam”). If you have
> any
> > concerns about this process, please contact us at
> legal.departm...@pdf.com.
>



-- 

Alaa Zubaidi
PDF Solutions, Inc.
333 West San Carlos Street, Suite 1000
San Jose, CA 95110  USA
Tel: 408-283-5639
fax: 408-938-6479
email: alaa.zuba...@pdf.com

-- 
*This message may contain confidential and privileged information. If it 
has been sent to you in error, please reply to advise the sender of the 
error and then immediately permanently delete it and all attachments to it 
from your systems. If you are not the intended recipient, do not read, 
copy, disclose or otherwise use this message or any attachments to it. The 
sender disclaims any liability for such unauthorized use. PLEASE NOTE that 
all incoming e-mails sent to PDF e-mail accounts will be archived and may 
be scanned by us and/or by external service providers to detect and prevent 
threats to our systems, investigate illegal or inappropriate behavior, 
and/or eliminate unsolicited promotional e-mails (“spam”). If you have any 
concerns about this process, please contact us at *
*legal.departm...@pdf.com* *.*


Re: Corrupt SSTABLE over and over

2016-08-12 Thread Alaa Zubaidi (PDF)
One more thing I noticed..
The corrupted SSTable is mentioned twice in the log file
[CompactionExecutor:10253] 2016-08-11 08:59:01,952 - Compacting (.)
[...la-1104-big-Data.db, ]
[CompactionExecutor:10253] 2016-08-11 09:32:04,814 - Compacting (.)
[...la-1104-big-Data.db]

Is it possible Cassandra is trying to compact the same file again while its
being compacted by another process?

Regards,
Alaa

On Thu, Aug 11, 2016 at 6:20 PM, Alaa Zubaidi (PDF) 
wrote:

> Hi,
>
> I have a 16 Node cluster, Cassandra 2.2.1 on Windows, local installation
> (NOT on the cloud)
>
> and I am getting
> Error [CompactionExecutor:2] 2016-08-12 06:51:52, 983 Cassandra
> Daemon.java:183 - Execption in thread Thread[CompactionExecutor:2,1main]
> org.apache.cassandra.io.FSReaderError: 
> org.apache.cassandra.io.sstable.CorruptSSTableExecption:
> org.apache.cassandra.io.compress.CurrptBlockException:
> (E:\\la-4886-big-Data.db): corruption detected, chunk at 4969092
> of length 10208.
> at 
> org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:357)
> ~[apache-cassandra-2.2.1.jar:2.2.1]
> 
> 
> ERROR [CompactionExecutor:2] ... FileUtils.java:463 - Existing
> forcefully due to file system exception on startup, disk failure policy
> "stop"
>
> I tried sstablescrub but it crashed with hs-err-pid-...
> I removed the corrupted file and started the Node again, after one day the
> corruption came back again, I removed the files, and restarted Cassandra,
> it worked for few days, then I ran "nodetool repair" after it finished,
> Cassandra failed again but with commitlog corruption, after removing the
> commitlog files, it failed again with another sstable corruption.
>
> I was also checking the HW, file system, and memory, the VMware logs
> showed no HW error, also the HW management logs showed NO problems or
> issues.
> Also checked the Windows Logs (Application and System) the only thing I
> found is on the system logs "Cassandra Service terminated with
> service-specific error Cannot create another system semaphore.
>
> I could not find any thing regarding that error, all comments point to
> application log.
>
> Any help is appreciated..
>
> --
>
> Alaa Zubaidi
>
>


-- 

Alaa Zubaidi
PDF Solutions, Inc.
333 West San Carlos Street, Suite 1000
San Jose, CA 95110  USA
Tel: 408-283-5639
fax: 408-938-6479
email: alaa.zuba...@pdf.com

-- 
*This message may contain confidential and privileged information. If it 
has been sent to you in error, please reply to advise the sender of the 
error and then immediately permanently delete it and all attachments to it 
from your systems. If you are not the intended recipient, do not read, 
copy, disclose or otherwise use this message or any attachments to it. The 
sender disclaims any liability for such unauthorized use. PLEASE NOTE that 
all incoming e-mails sent to PDF e-mail accounts will be archived and may 
be scanned by us and/or by external service providers to detect and prevent 
threats to our systems, investigate illegal or inappropriate behavior, 
and/or eliminate unsolicited promotional e-mails (“spam”). If you have any 
concerns about this process, please contact us at *
*legal.departm...@pdf.com* *.*


Re: Corrupt SSTABLE over and over

2016-08-12 Thread Jason Wee
cassandra run on virtual server (vmware)?

> I tried sstablescrub but it crashed with hs-err-pid-...
maybe try with larger heap allocated to sstablescrub

this sstable corrupt i ran into it as well (on cassandra 1.2), first i
try nodetool scrub, still persist, then offline sstablescrub still
persist, wipe the node and it happen again, then i change the hardware
(disk and mem). things went good.

hth

jason


On Fri, Aug 12, 2016 at 9:20 AM, Alaa Zubaidi (PDF)
 wrote:
> Hi,
>
> I have a 16 Node cluster, Cassandra 2.2.1 on Windows, local installation
> (NOT on the cloud)
>
> and I am getting
> Error [CompactionExecutor:2] 2016-08-12 06:51:52, 983 Cassandra
> Daemon.java:183 - Execption in thread Thread[CompactionExecutor:2,1main]
> org.apache.cassandra.io.FSReaderError:
> org.apache.cassandra.io.sstable.CorruptSSTableExecption:
> org.apache.cassandra.io.compress.CurrptBlockException:
> (E:\\la-4886-big-Data.db): corruption detected, chunk at 4969092 of
> length 10208.
> at
> org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:357)
> ~[apache-cassandra-2.2.1.jar:2.2.1]
> 
> 
> ERROR [CompactionExecutor:2] ... FileUtils.java:463 - Existing
> forcefully due to file system exception on startup, disk failure policy
> "stop"
>
> I tried sstablescrub but it crashed with hs-err-pid-...
> I removed the corrupted file and started the Node again, after one day the
> corruption came back again, I removed the files, and restarted Cassandra, it
> worked for few days, then I ran "nodetool repair" after it finished,
> Cassandra failed again but with commitlog corruption, after removing the
> commitlog files, it failed again with another sstable corruption.
>
> I was also checking the HW, file system, and memory, the VMware logs showed
> no HW error, also the HW management logs showed NO problems or issues.
> Also checked the Windows Logs (Application and System) the only thing I
> found is on the system logs "Cassandra Service terminated with
> service-specific error Cannot create another system semaphore.
>
> I could not find any thing regarding that error, all comments point to
> application log.
>
> Any help is appreciated..
>
> --
>
> Alaa Zubaidi
>
>
> This message may contain confidential and privileged information. If it has
> been sent to you in error, please reply to advise the sender of the error
> and then immediately permanently delete it and all attachments to it from
> your systems. If you are not the intended recipient, do not read, copy,
> disclose or otherwise use this message or any attachments to it. The sender
> disclaims any liability for such unauthorized use. PLEASE NOTE that all
> incoming e-mails sent to PDF e-mail accounts will be archived and may be
> scanned by us and/or by external service providers to detect and prevent
> threats to our systems, investigate illegal or inappropriate behavior,
> and/or eliminate unsolicited promotional e-mails (“spam”). If you have any
> concerns about this process, please contact us at legal.departm...@pdf.com.


Corrupt SSTABLE over and over

2016-08-11 Thread Alaa Zubaidi (PDF)
Hi,

I have a 16 Node cluster, Cassandra 2.2.1 on Windows, local installation
(NOT on the cloud)

and I am getting
Error [CompactionExecutor:2] 2016-08-12 06:51:52, 983 Cassandra
Daemon.java:183 - Execption in thread Thread[CompactionExecutor:2,1main]
org.apache.cassandra.io.FSReaderError:
org.apache.cassandra.io.sstable.CorruptSSTableExecption:
org.apache.cassandra.io.compress.CurrptBlockException:
(E:\\la-4886-big-Data.db): corruption detected, chunk at 4969092 of
length 10208.
at
org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:357)
~[apache-cassandra-2.2.1.jar:2.2.1]


ERROR [CompactionExecutor:2] ... FileUtils.java:463 - Existing
forcefully due to file system exception on startup, disk failure policy
"stop"

I tried sstablescrub but it crashed with hs-err-pid-...
I removed the corrupted file and started the Node again, after one day the
corruption came back again, I removed the files, and restarted Cassandra,
it worked for few days, then I ran "nodetool repair" after it finished,
Cassandra failed again but with commitlog corruption, after removing the
commitlog files, it failed again with another sstable corruption.

I was also checking the HW, file system, and memory, the VMware logs showed
no HW error, also the HW management logs showed NO problems or issues.
Also checked the Windows Logs (Application and System) the only thing I
found is on the system logs "Cassandra Service terminated with
service-specific error Cannot create another system semaphore.

I could not find any thing regarding that error, all comments point to
application log.

Any help is appreciated..

-- 

Alaa Zubaidi

-- 
*This message may contain confidential and privileged information. If it 
has been sent to you in error, please reply to advise the sender of the 
error and then immediately permanently delete it and all attachments to it 
from your systems. If you are not the intended recipient, do not read, 
copy, disclose or otherwise use this message or any attachments to it. The 
sender disclaims any liability for such unauthorized use. PLEASE NOTE that 
all incoming e-mails sent to PDF e-mail accounts will be archived and may 
be scanned by us and/or by external service providers to detect and prevent 
threats to our systems, investigate illegal or inappropriate behavior, 
and/or eliminate unsolicited promotional e-mails (“spam”). If you have any 
concerns about this process, please contact us at *
*legal.departm...@pdf.com* *.*