Re: Corrupted sstables

2019-05-14 Thread Roy Burstein
Hi Alain ,
We are adding 12 tables on weekly basis job  , and dropping history table
.
Our job is looking for schema mismatch by running "SELECT peer,
schema_version, tokens FROM peers"  before it adds/drops each table .
nodetool describecluster  looks ok  , only one schema version   .
Cluster Information:
Name:  [removed]
Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
DynamicEndPointSnitch: enabled
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
30cb4963-109c-3077-8bdd-df9bfb313568: [10...output truncated]

We have shutdown recently (maint job) the cluster and started it again ,but
the job is daily .
I have tried to correlate job time to the table corruption timestamp but
did not find any relation, but this direction may be relevant .

Thanks,
Roy

On Fri, May 10, 2019 at 3:13 PM Alain RODRIGUEZ  wrote:

> Hello Roy,
>
> The name of the table makes me think that you might be doing automated
> changes to the schema. I just dug this topic for someone else and schema
> changes are way less consistent than standard Cassandra operations (see
> https://issues.apache.org/jira/browse/CASSANDRA-10699).
>
>> sessions_rawdata/sessions_v2_2019_05_06-9cae0c20585411e99aa867a11519e31c/md-816-big-I
>>
>>
> Idea 1: Some of these queries might have failed for multiple reasons on a
> node (down for too long, race conditions, ...), leaving the cluster in an
> unstable state where there is a schema disagreement. In that case, you
> could have troubles when adding a new node I have seen it happening. Could
> you check/share with us the output of: 'nodetool describecluster'?
>
> Also did you tried recently to perform a rolling restart? This often helps
> synchronising local schemas and 'could' fix the issue. Another option is
> 'nodetool resetlocalschema' on node(s) out of sync.
>
> idea 2: If you identified that you have broken second indexes, maybe give
> a try at running 'nodetool rebuild_index   '
> on all nodes before adding the next node?
> https://cassandra.apache.org/doc/latest/tools/nodetool/rebuild_index.html
>
> Hope this helps,
> C*heers,
> ---
> Alain Rodriguez - al...@thelastpickle.com
> France / Spain
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
>
> Le jeu. 9 mai 2019 à 17:29, Jason Wee  a écrit :
>
>> maybe print out value into the logfile and that should lead to some
>> clue where it might be the problem?
>>
>> On Tue, May 7, 2019 at 4:58 PM Paul Chandler  wrote:
>> >
>> > Roy, We spent along time trying to fix it, but didn’t find a solution,
>> it was a test cluster, so we ended up rebuilding the cluster, rather than
>> spending anymore time trying to fix the corruption. We have worked out what
>> had caused it, so were happy it wasn’t going to occur in production. Sorry
>> that is not much help, but I am not even sure it is the same issue you have.
>> >
>> > Paul
>> >
>> >
>> >
>> > On 7 May 2019, at 07:14, Roy Burstein  wrote:
>> >
>> > I can say that it happens now as well ,currently no node has been
>> added/removed .
>> > Corrupted sstables are usually the index files and in some machines the
>> sstable even does not exist on the filesystem.
>> > On one machine I was able to dump the sstable to dump file without any
>> issue  . Any idea how to tackle this issue ?
>> >
>> >
>> > On Tue, May 7, 2019 at 12:32 AM Paul Chandler 
>> wrote:
>> >>
>> >> Roy,
>> >>
>> >> I have seen this exception before when a column had been dropped then
>> re added with the same name but a different type. In particular we dropped
>> a column and re created it as static, then had this exception from the old
>> sstables created prior to the ddl change.
>> >>
>> >> Not sure if this applies in your case.
>> >>
>> >> Thanks
>> >>
>> >> Paul
>> >>
>> >> On 6 May 2019, at 21:52, Nitan Kainth  wrote:
>> >>
>> >> can Disk have bad sectors? fccheck or something similar can help.
>> >>
>> >> Long shot: repair or any other operation conflicting. Would leave that
>> to others.
>> >>
>> >> On Mon, May 6, 2019 at 3:50 PM Roy Burstein 
>> wrote:
>> >>>
>> >>> It happens on the same column families and they have the same ddl (as
>> already posted) . I did not check it after cleanup
>> >>> .
>> >>>
>> >>> On Mon, May 6, 2019, 23:43 Nitan Kainth 
>> 

Re: Corrupted sstables

2019-05-10 Thread Alain RODRIGUEZ
Hello Roy,

The name of the table makes me think that you might be doing automated
changes to the schema. I just dug this topic for someone else and schema
changes are way less consistent than standard Cassandra operations (see
https://issues.apache.org/jira/browse/CASSANDRA-10699).

> sessions_rawdata/sessions_v2_2019_05_06-9cae0c20585411e99aa867a11519e31c/md-816-big-I
>
>
Idea 1: Some of these queries might have failed for multiple reasons on a
node (down for too long, race conditions, ...), leaving the cluster in an
unstable state where there is a schema disagreement. In that case, you
could have troubles when adding a new node I have seen it happening. Could
you check/share with us the output of: 'nodetool describecluster'?

Also did you tried recently to perform a rolling restart? This often helps
synchronising local schemas and 'could' fix the issue. Another option is
'nodetool resetlocalschema' on node(s) out of sync.

idea 2: If you identified that you have broken second indexes, maybe give a
try at running 'nodetool rebuild_index   ' on
all nodes before adding the next node?
https://cassandra.apache.org/doc/latest/tools/nodetool/rebuild_index.html

Hope this helps,
C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com



Le jeu. 9 mai 2019 à 17:29, Jason Wee  a écrit :

> maybe print out value into the logfile and that should lead to some
> clue where it might be the problem?
>
> On Tue, May 7, 2019 at 4:58 PM Paul Chandler  wrote:
> >
> > Roy, We spent along time trying to fix it, but didn’t find a solution,
> it was a test cluster, so we ended up rebuilding the cluster, rather than
> spending anymore time trying to fix the corruption. We have worked out what
> had caused it, so were happy it wasn’t going to occur in production. Sorry
> that is not much help, but I am not even sure it is the same issue you have.
> >
> > Paul
> >
> >
> >
> > On 7 May 2019, at 07:14, Roy Burstein  wrote:
> >
> > I can say that it happens now as well ,currently no node has been
> added/removed .
> > Corrupted sstables are usually the index files and in some machines the
> sstable even does not exist on the filesystem.
> > On one machine I was able to dump the sstable to dump file without any
> issue  . Any idea how to tackle this issue ?
> >
> >
> > On Tue, May 7, 2019 at 12:32 AM Paul Chandler  wrote:
> >>
> >> Roy,
> >>
> >> I have seen this exception before when a column had been dropped then
> re added with the same name but a different type. In particular we dropped
> a column and re created it as static, then had this exception from the old
> sstables created prior to the ddl change.
> >>
> >> Not sure if this applies in your case.
> >>
> >> Thanks
> >>
> >> Paul
> >>
> >> On 6 May 2019, at 21:52, Nitan Kainth  wrote:
> >>
> >> can Disk have bad sectors? fccheck or something similar can help.
> >>
> >> Long shot: repair or any other operation conflicting. Would leave that
> to others.
> >>
> >> On Mon, May 6, 2019 at 3:50 PM Roy Burstein 
> wrote:
> >>>
> >>> It happens on the same column families and they have the same ddl (as
> already posted) . I did not check it after cleanup
> >>> .
> >>>
> >>> On Mon, May 6, 2019, 23:43 Nitan Kainth  wrote:
> >>>>
> >>>> This is strange, never saw this. does it happen to same column family?
> >>>>
> >>>> Does it happen after cleanup?
> >>>>
> >>>> On Mon, May 6, 2019 at 3:41 PM Roy Burstein 
> wrote:
> >>>>>
> >>>>> Yes.
> >>>>>
> >>>>> On Mon, May 6, 2019, 23:23 Nitan Kainth 
> wrote:
> >>>>>>
> >>>>>> Roy,
> >>>>>>
> >>>>>> You mean all nodes show corruption when you add a node to cluster??
> >>>>>>
> >>>>>>
> >>>>>> Regards,
> >>>>>> Nitan
> >>>>>> Cell: 510 449 9629
> >>>>>>
> >>>>>> On May 6, 2019, at 2:48 PM, Roy Burstein 
> wrote:
> >>>>>>
> >>>>>> It happened  on all the servers in the cluster every time I have
> added node
> >>>>>> .
> >>>>>> This is new cluster nothing was upgraded here , we have a similar
> cluster
> >>>>>> running on C* 2.1.15 with no issues .
> >>>>>> We are aware to the scrub utility just it reproduce every time we
> added
> >>>>>> node to the cluster .
> >>>>>>
> >>>>>> We have many tables there
> >>
> >>
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Corrupted sstables

2019-05-09 Thread Jason Wee
maybe print out value into the logfile and that should lead to some
clue where it might be the problem?

On Tue, May 7, 2019 at 4:58 PM Paul Chandler  wrote:
>
> Roy, We spent along time trying to fix it, but didn’t find a solution, it was 
> a test cluster, so we ended up rebuilding the cluster, rather than spending 
> anymore time trying to fix the corruption. We have worked out what had caused 
> it, so were happy it wasn’t going to occur in production. Sorry that is not 
> much help, but I am not even sure it is the same issue you have.
>
> Paul
>
>
>
> On 7 May 2019, at 07:14, Roy Burstein  wrote:
>
> I can say that it happens now as well ,currently no node has been 
> added/removed .
> Corrupted sstables are usually the index files and in some machines the 
> sstable even does not exist on the filesystem.
> On one machine I was able to dump the sstable to dump file without any issue  
> . Any idea how to tackle this issue ?
>
>
> On Tue, May 7, 2019 at 12:32 AM Paul Chandler  wrote:
>>
>> Roy,
>>
>> I have seen this exception before when a column had been dropped then re 
>> added with the same name but a different type. In particular we dropped a 
>> column and re created it as static, then had this exception from the old 
>> sstables created prior to the ddl change.
>>
>> Not sure if this applies in your case.
>>
>> Thanks
>>
>> Paul
>>
>> On 6 May 2019, at 21:52, Nitan Kainth  wrote:
>>
>> can Disk have bad sectors? fccheck or something similar can help.
>>
>> Long shot: repair or any other operation conflicting. Would leave that to 
>> others.
>>
>> On Mon, May 6, 2019 at 3:50 PM Roy Burstein  wrote:
>>>
>>> It happens on the same column families and they have the same ddl (as 
>>> already posted) . I did not check it after cleanup
>>> .
>>>
>>> On Mon, May 6, 2019, 23:43 Nitan Kainth  wrote:
>>>>
>>>> This is strange, never saw this. does it happen to same column family?
>>>>
>>>> Does it happen after cleanup?
>>>>
>>>> On Mon, May 6, 2019 at 3:41 PM Roy Burstein  wrote:
>>>>>
>>>>> Yes.
>>>>>
>>>>> On Mon, May 6, 2019, 23:23 Nitan Kainth  wrote:
>>>>>>
>>>>>> Roy,
>>>>>>
>>>>>> You mean all nodes show corruption when you add a node to cluster??
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Nitan
>>>>>> Cell: 510 449 9629
>>>>>>
>>>>>> On May 6, 2019, at 2:48 PM, Roy Burstein  wrote:
>>>>>>
>>>>>> It happened  on all the servers in the cluster every time I have added 
>>>>>> node
>>>>>> .
>>>>>> This is new cluster nothing was upgraded here , we have a similar cluster
>>>>>> running on C* 2.1.15 with no issues .
>>>>>> We are aware to the scrub utility just it reproduce every time we added
>>>>>> node to the cluster .
>>>>>>
>>>>>> We have many tables there
>>
>>
>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Corrupted sstables

2019-05-07 Thread Paul Chandler
Roy, We spent along time trying to fix it, but didn’t find a solution, it was a 
test cluster, so we ended up rebuilding the cluster, rather than spending 
anymore time trying to fix the corruption. We have worked out what had caused 
it, so were happy it wasn’t going to occur in production. Sorry that is not 
much help, but I am not even sure it is the same issue you have. 

Paul



> On 7 May 2019, at 07:14, Roy Burstein  wrote:
> 
> I can say that it happens now as well ,currently no node has been 
> added/removed . 
> Corrupted sstables are usually the index files and in some machines the 
> sstable even does not exist on the filesystem.
> On one machine I was able to dump the sstable to dump file without any issue  
> . Any idea how to tackle this issue ? 
>  
> 
> On Tue, May 7, 2019 at 12:32 AM Paul Chandler  <mailto:p...@redshots.com>> wrote:
> Roy,
> 
> I have seen this exception before when a column had been dropped then re 
> added with the same name but a different type. In particular we dropped a 
> column and re created it as static, then had this exception from the old 
> sstables created prior to the ddl change.
> 
> Not sure if this applies in your case.
> 
> Thanks 
> 
> Paul
> 
>> On 6 May 2019, at 21:52, Nitan Kainth > <mailto:nitankai...@gmail.com>> wrote:
>> 
>> can Disk have bad sectors? fccheck or something similar can help.
>> 
>> Long shot: repair or any other operation conflicting. Would leave that to 
>> others.
>> 
>> On Mon, May 6, 2019 at 3:50 PM Roy Burstein > <mailto:burstein@gmail.com>> wrote:
>> It happens on the same column families and they have the same ddl (as 
>> already posted) . I did not check it after cleanup 
>> .
>> 
>> On Mon, May 6, 2019, 23:43 Nitan Kainth > <mailto:nitankai...@gmail.com>> wrote:
>> This is strange, never saw this. does it happen to same column family?
>> 
>> Does it happen after cleanup?
>> 
>> On Mon, May 6, 2019 at 3:41 PM Roy Burstein > <mailto:burstein@gmail.com>> wrote:
>> Yes.
>> 
>> On Mon, May 6, 2019, 23:23 Nitan Kainth > <mailto:nitankai...@gmail.com>> wrote:
>> Roy,
>> 
>> You mean all nodes show corruption when you add a node to cluster??
>> 
>> 
>> Regards,
>> Nitan
>> Cell: 510 449 9629 
>> 
>> On May 6, 2019, at 2:48 PM, Roy Burstein > <mailto:burstein@gmail.com>> wrote:
>> 
>>> It happened  on all the servers in the cluster every time I have added node
>>> .
>>> This is new cluster nothing was upgraded here , we have a similar cluster
>>> running on C* 2.1.15 with no issues .
>>> We are aware to the scrub utility just it reproduce every time we added
>>> node to the cluster .
>>> 
>>> We have many tables there
> 



Re: Corrupted sstables

2019-05-07 Thread Roy Burstein
I can say that it happens now as well ,currently no node has been
added/removed .
Corrupted sstables are usually the index files and in some machines the
sstable even does not exist on the filesystem.
On one machine I was able to dump the sstable to dump file without any
issue  . Any idea how to tackle this issue ?


On Tue, May 7, 2019 at 12:32 AM Paul Chandler  wrote:

> Roy,
>
> I have seen this exception before when a column had been dropped then re
> added with the same name but a different type. In particular we dropped a
> column and re created it as static, then had this exception from the old
> sstables created prior to the ddl change.
>
> Not sure if this applies in your case.
>
> Thanks
>
> Paul
>
> On 6 May 2019, at 21:52, Nitan Kainth  wrote:
>
> can Disk have bad sectors? fccheck or something similar can help.
>
> Long shot: repair or any other operation conflicting. Would leave that to
> others.
>
> On Mon, May 6, 2019 at 3:50 PM Roy Burstein 
> wrote:
>
>> It happens on the same column families and they have the same ddl (as
>> already posted) . I did not check it after cleanup
>> .
>>
>> On Mon, May 6, 2019, 23:43 Nitan Kainth  wrote:
>>
>>> This is strange, never saw this. does it happen to same column family?
>>>
>>> Does it happen after cleanup?
>>>
>>> On Mon, May 6, 2019 at 3:41 PM Roy Burstein 
>>> wrote:
>>>
>>>> Yes.
>>>>
>>>> On Mon, May 6, 2019, 23:23 Nitan Kainth  wrote:
>>>>
>>>>> Roy,
>>>>>
>>>>> You mean all nodes show corruption when you add a node to cluster??
>>>>>
>>>>>
>>>>> Regards,
>>>>> Nitan
>>>>> Cell: 510 449 9629
>>>>>
>>>>> On May 6, 2019, at 2:48 PM, Roy Burstein 
>>>>> wrote:
>>>>>
>>>>> It happened  on all the servers in the cluster every time I have added
>>>>> node
>>>>> .
>>>>> This is new cluster nothing was upgraded here , we have a similar
>>>>> cluster
>>>>> running on C* 2.1.15 with no issues .
>>>>> We are aware to the scrub utility just it reproduce every time we added
>>>>> node to the cluster .
>>>>>
>>>>> We have many tables there
>>>>>
>>>>>
>


Re: Corrupted sstables

2019-05-06 Thread Paul Chandler
Roy,

I have seen this exception before when a column had been dropped then re added 
with the same name but a different type. In particular we dropped a column and 
re created it as static, then had this exception from the old sstables created 
prior to the ddl change.

Not sure if this applies in your case.

Thanks 

Paul

> On 6 May 2019, at 21:52, Nitan Kainth  wrote:
> 
> can Disk have bad sectors? fccheck or something similar can help.
> 
> Long shot: repair or any other operation conflicting. Would leave that to 
> others.
> 
> On Mon, May 6, 2019 at 3:50 PM Roy Burstein  > wrote:
> It happens on the same column families and they have the same ddl (as already 
> posted) . I did not check it after cleanup 
> .
> 
> On Mon, May 6, 2019, 23:43 Nitan Kainth  > wrote:
> This is strange, never saw this. does it happen to same column family?
> 
> Does it happen after cleanup?
> 
> On Mon, May 6, 2019 at 3:41 PM Roy Burstein  > wrote:
> Yes.
> 
> On Mon, May 6, 2019, 23:23 Nitan Kainth  > wrote:
> Roy,
> 
> You mean all nodes show corruption when you add a node to cluster??
> 
> 
> Regards,
> Nitan
> Cell: 510 449 9629 
> 
> On May 6, 2019, at 2:48 PM, Roy Burstein  > wrote:
> 
>> It happened  on all the servers in the cluster every time I have added node
>> .
>> This is new cluster nothing was upgraded here , we have a similar cluster
>> running on C* 2.1.15 with no issues .
>> We are aware to the scrub utility just it reproduce every time we added
>> node to the cluster .
>> 
>> We have many tables there



Re: Corrupted sstables

2019-05-06 Thread Nitan Kainth
can Disk have bad sectors? fccheck or something similar can help.

Long shot: repair or any other operation conflicting. Would leave that to
others.

On Mon, May 6, 2019 at 3:50 PM Roy Burstein  wrote:

> It happens on the same column families and they have the same ddl (as
> already posted) . I did not check it after cleanup
> .
>
> On Mon, May 6, 2019, 23:43 Nitan Kainth  wrote:
>
>> This is strange, never saw this. does it happen to same column family?
>>
>> Does it happen after cleanup?
>>
>> On Mon, May 6, 2019 at 3:41 PM Roy Burstein 
>> wrote:
>>
>>> Yes.
>>>
>>> On Mon, May 6, 2019, 23:23 Nitan Kainth  wrote:
>>>
 Roy,

 You mean all nodes show corruption when you add a node to cluster??


 Regards,

 Nitan

 Cell: 510 449 9629

 On May 6, 2019, at 2:48 PM, Roy Burstein 
 wrote:

 It happened  on all the servers in the cluster every time I have added
 node
 .
 This is new cluster nothing was upgraded here , we have a similar
 cluster
 running on C* 2.1.15 with no issues .
 We are aware to the scrub utility just it reproduce every time we added
 node to the cluster .

 We have many tables there




Re: Corrupted sstables

2019-05-06 Thread Roy Burstein
It happens on the same column families and they have the same ddl (as
already posted) . I did not check it after cleanup
.

On Mon, May 6, 2019, 23:43 Nitan Kainth  wrote:

> This is strange, never saw this. does it happen to same column family?
>
> Does it happen after cleanup?
>
> On Mon, May 6, 2019 at 3:41 PM Roy Burstein 
> wrote:
>
>> Yes.
>>
>> On Mon, May 6, 2019, 23:23 Nitan Kainth  wrote:
>>
>>> Roy,
>>>
>>> You mean all nodes show corruption when you add a node to cluster??
>>>
>>>
>>> Regards,
>>>
>>> Nitan
>>>
>>> Cell: 510 449 9629
>>>
>>> On May 6, 2019, at 2:48 PM, Roy Burstein  wrote:
>>>
>>> It happened  on all the servers in the cluster every time I have added
>>> node
>>> .
>>> This is new cluster nothing was upgraded here , we have a similar cluster
>>> running on C* 2.1.15 with no issues .
>>> We are aware to the scrub utility just it reproduce every time we added
>>> node to the cluster .
>>>
>>> We have many tables there
>>>
>>>


Re: Corrupted sstables

2019-05-06 Thread Nitan Kainth
This is strange, never saw this. does it happen to same column family?

Does it happen after cleanup?

On Mon, May 6, 2019 at 3:41 PM Roy Burstein  wrote:

> Yes.
>
> On Mon, May 6, 2019, 23:23 Nitan Kainth  wrote:
>
>> Roy,
>>
>> You mean all nodes show corruption when you add a node to cluster??
>>
>>
>> Regards,
>>
>> Nitan
>>
>> Cell: 510 449 9629
>>
>> On May 6, 2019, at 2:48 PM, Roy Burstein  wrote:
>>
>> It happened  on all the servers in the cluster every time I have added
>> node
>> .
>> This is new cluster nothing was upgraded here , we have a similar cluster
>> running on C* 2.1.15 with no issues .
>> We are aware to the scrub utility just it reproduce every time we added
>> node to the cluster .
>>
>> We have many tables there
>>
>>


Re: Corrupted sstables

2019-05-06 Thread Roy Burstein
Yes.

On Mon, May 6, 2019, 23:23 Nitan Kainth  wrote:

> Roy,
>
> You mean all nodes show corruption when you add a node to cluster??
>
>
> Regards,
>
> Nitan
>
> Cell: 510 449 9629
>
> On May 6, 2019, at 2:48 PM, Roy Burstein  wrote:
>
> It happened  on all the servers in the cluster every time I have added node
> .
> This is new cluster nothing was upgraded here , we have a similar cluster
> running on C* 2.1.15 with no issues .
> We are aware to the scrub utility just it reproduce every time we added
> node to the cluster .
>
> We have many tables there
>
>


Re: Corrupted sstables

2019-05-06 Thread Nitan Kainth
Roy,

You mean all nodes show corruption when you add a node to cluster??


Regards,
Nitan
Cell: 510 449 9629

> On May 6, 2019, at 2:48 PM, Roy Burstein  wrote:
> 
> It happened  on all the servers in the cluster every time I have added node
> .
> This is new cluster nothing was upgraded here , we have a similar cluster
> running on C* 2.1.15 with no issues .
> We are aware to the scrub utility just it reproduce every time we added
> node to the cluster .
> 
> We have many tables there


Re: Corrupted sstables

2019-05-06 Thread Roy Burstein
It happened  on all the servers in the cluster every time I have added node
.
This is new cluster nothing was upgraded here , we have a similar cluster
running on C* 2.1.15 with no issues .
We are aware to the scrub utility just it reproduce every time we added
node to the cluster .

We have many tables therethe DDL of the corrupted sstables looks the same:
CREATE TABLE rawdata.a1 (
session_start_time_timeslice bigint,
uid_bucket int,
vid_bucket int,
pid int,
uid text,
sid bigint,
vid bigint,
data_type text,
data_id bigint,
data blob,
PRIMARY KEY ((session_start_time_timeslice, uid_bucket, vid_bucket),
pid, uid, sid, vid, data_type, data_id)
) WITH CLUSTERING ORDER BY (pid ASC, uid ASC, sid ASC, vid ASC, data_type
ASC, data_id ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.2
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';

CREATE TABLE rawdata.a2 (
session_start_time_timeslice bigint,
uid_bucket int,
vid_bucket int,
pid int,
uid text,
sid bigint,
vid bigint,
data_type text,
data_id bigint,
data blob,
PRIMARY KEY ((session_start_time_timeslice, uid_bucket, vid_bucket),
pid, uid, sid, vid, data_type, data_id)
) WITH CLUSTERING ORDER BY (pid ASC, uid ASC, sid ASC, vid ASC, data_type
ASC, data_id ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.2
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';

CREATE TABLE rawdata.a3 (
session_start_time_timeslice bigint,
uid_bucket int,
vid_bucket int,
pid int,
uid text,
sid bigint,
vid bigint,
data_type text,
data_id bigint,
data blob,
PRIMARY KEY ((session_start_time_timeslice, uid_bucket, vid_bucket),
pid, uid, sid, vid, data_type, data_id)
) WITH CLUSTERING ORDER BY (pid ASC, uid ASC, sid ASC, vid ASC, data_type
ASC, data_id ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.2
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';


CREATE TABLE rawdata.a4 (
session_start_time_timeslice bigint,
uid_bucket int,
vid_bucket int,
pid int,
uid text,
sid bigint,
vid bigint,
data_type text,
data_id bigint,
data blob,
PRIMARY KEY ((session_start_time_timeslice, uid_bucket, vid_bucket),
pid, uid, sid, vid, data_type, data_id)
) WITH CLUSTERING ORDER BY (pid ASC, uid ASC, sid ASC, vid ASC, data_type
ASC, data_id ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.2
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';



On Mon, May 6, 2019 at 9:44 PM Jeff Jirsa  wrote:

> Before you scrub, from which version were you upgrading and can you post
&g

Re: Corrupted sstables

2019-05-06 Thread Jeff Jirsa
Before you scrub, from which version were you upgrading and can you post a(n 
anonymized) schema?

-- 
Jeff Jirsa


> On May 6, 2019, at 11:37 AM, Nitan Kainth  wrote:
> 
> Did you try sstablescrub?
> If that doesn't work, you can delete all files of this sstable id and then 
> run repair -pr on this node.
> 
>> On Mon, May 6, 2019 at 9:20 AM Roy Burstein  wrote:
>> Hi , 
>> We are having issues with Cassandra 3.11.4 , after adding node to the 
>> cluster we get many corrupted files across the cluster (almost all nodes) 
>> ,this is reproducible in our env.  .
>> We  have 69 nodes in the cluster ,disk_access_mode: standard . 
>> 
>> The stack trace : 
>> WARN  [ReadStage-4] 2019-05-06 06:44:19,843 
>> AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread 
>> Thread[ReadStage-4,5,main]: {}
>> java.lang.RuntimeException: 
>> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
>> /var/lib/cassandra/data/disk1/sessions_rawdata/sessions_v2_2019_05_06-9cae0c20585411e99aa867a11519e31c/md-816-big-I
>> ndex.db
>> at 
>> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2588)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
>> ~[na:1.8.0-zing_19.03.0.0]
>> at 
>> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
>>  [apache-cassandra-3.11.4.jar:3.11.4]
>> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:114) 
>> [apache-cassandra-3.11.4.jar:3.11.4]
>> at java.lang.Thread.run(Thread.java:748) [na:1.8.0-zing_19.03.0.0]
>> Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: 
>> Corrupted: 
>> /var/lib/cassandra/data/disk1/sessions_rawdata/sessions_v2_2019_05_06-9cae0c20585411e99aa867a11519e31c/md-816-big-Index.db
>> at 
>> org.apache.cassandra.io.sstable.format.big.BigTableReader.getPosition(BigTableReader.java:275)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.io.sstable.format.SSTableReader.getPosition(SSTableReader.java:1586)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(BigTableReader.java:64)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.db.rows.UnfilteredRowIteratorWithLowerBound.initializeIterator(UnfilteredRowIteratorWithLowerBound.java:108)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.maybeInit(LazilyInitializedUnfilteredRowIterator.java:48)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:99)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.db.rows.UnfilteredRowIteratorWithLowerBound.computeNext(UnfilteredRowIteratorWithLowerBound.java:119)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.db.rows.UnfilteredRowIteratorWithLowerBound.computeNext(UnfilteredRowIteratorWithLowerBound.java:48)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:525)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:385)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.db.rows.UnfilteredRowIterator.isEmpty(UnfilteredRowIterator.java:67)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.db.SinglePartitionReadCommand.withSSTablesIterated(SinglePartitionReadCommand.java:853)
>>  

Re: Corrupted sstables

2019-05-06 Thread Nitan Kainth
Did you try sstablescrub?
If that doesn't work, you can delete all files of this sstable id and then
run repair -pr on this node.

On Mon, May 6, 2019 at 9:20 AM Roy Burstein  wrote:

> Hi ,
> We are having issues with Cassandra 3.11.4 , after adding node to the
> cluster we get many corrupted files across the cluster (almost all nodes)
> ,this is reproducible in our env.  .
> We  have 69 nodes in the cluster ,disk_access_mode: standard .
>
> The stack trace :
>
> WARN  [ReadStage-4] 2019-05-06 06:44:19,843 
> AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread 
> Thread[ReadStage-4,5,main]: {}
> java.lang.RuntimeException: 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /var/lib/cassandra/data/disk1/sessions_rawdata/sessions_v2_2019_05_06-9cae0c20585411e99aa867a11519e31c/md-816-big-I
> ndex.db
> at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2588)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0-zing_19.03.0.0]
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
>  [apache-cassandra-3.11.4.jar:3.11.4]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:114) 
> [apache-cassandra-3.11.4.jar:3.11.4]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0-zing_19.03.0.0]
> Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: 
> Corrupted: 
> /var/lib/cassandra/data/disk1/sessions_rawdata/sessions_v2_2019_05_06-9cae0c20585411e99aa867a11519e31c/md-816-big-Index.db
> at 
> org.apache.cassandra.io.sstable.format.big.BigTableReader.getPosition(BigTableReader.java:275)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.io.sstable.format.SSTableReader.getPosition(SSTableReader.java:1586)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(BigTableReader.java:64)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorWithLowerBound.initializeIterator(UnfilteredRowIteratorWithLowerBound.java:108)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.maybeInit(LazilyInitializedUnfilteredRowIterator.java:48)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:99)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorWithLowerBound.computeNext(UnfilteredRowIteratorWithLowerBound.java:119)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorWithLowerBound.computeNext(UnfilteredRowIteratorWithLowerBound.java:48)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:525)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:385)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.db.rows.UnfilteredRowIterator.isEmpty(UnfilteredRowIterator.java:67)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.db.SinglePartitionReadCommand.withSSTablesIterated(SinglePartitionReadCommand.java:853)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDiskInternal(SinglePartitionReadCommand.java:797)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> 

Corrupted sstables

2019-05-06 Thread Roy Burstein
Hi ,
We are having issues with Cassandra 3.11.4 , after adding node to the
cluster we get many corrupted files across the cluster (almost all nodes)
,this is reproducible in our env.  .
We  have 69 nodes in the cluster ,disk_access_mode: standard .

The stack trace :

WARN  [ReadStage-4] 2019-05-06 06:44:19,843
AbstractLocalAwareExecutorService.java:167 - Uncaught exception on
thread Thread[ReadStage-4,5,main]: {}
java.lang.RuntimeException:
org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted:
/var/lib/cassandra/data/disk1/sessions_rawdata/sessions_v2_2019_05_06-9cae0c20585411e99aa867a11519e31c/md-816-big-I
ndex.db
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2588)
~[apache-cassandra-3.11.4.jar:3.11.4]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[na:1.8.0-zing_19.03.0.0]
at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
[apache-cassandra-3.11.4.jar:3.11.4]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:114)
[apache-cassandra-3.11.4.jar:3.11.4]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0-zing_19.03.0.0]
Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException:
Corrupted: 
/var/lib/cassandra/data/disk1/sessions_rawdata/sessions_v2_2019_05_06-9cae0c20585411e99aa867a11519e31c/md-816-big-Index.db
at 
org.apache.cassandra.io.sstable.format.big.BigTableReader.getPosition(BigTableReader.java:275)
~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.io.sstable.format.SSTableReader.getPosition(SSTableReader.java:1586)
~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(BigTableReader.java:64)
~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorWithLowerBound.initializeIterator(UnfilteredRowIteratorWithLowerBound.java:108)
~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.maybeInit(LazilyInitializedUnfilteredRowIterator.java:48)
~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:99)
~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorWithLowerBound.computeNext(UnfilteredRowIteratorWithLowerBound.java:119)
~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorWithLowerBound.computeNext(UnfilteredRowIteratorWithLowerBound.java:48)
~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:525)
~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:385)
~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.db.rows.UnfilteredRowIterator.isEmpty(UnfilteredRowIterator.java:67)
~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.db.SinglePartitionReadCommand.withSSTablesIterated(SinglePartitionReadCommand.java:853)
~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDiskInternal(SinglePartitionReadCommand.java:797)
~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDisk(SinglePartitionReadCommand.java:670)
~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.db.SinglePartitionReadCommand.queryStorage(SinglePartitionReadCommand.java:504)
~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.db.ReadCommand.executeLocally(ReadCommand.java:423)
~[apache-cassandra-3.11.4.jar:3.11.4]
at 

Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Russ Lavoie
We are using cassandra 1.2.10 (With JNA installed) on ubuntu 12.04.3 and are 
running our instances in Amazon Web Services.

What I am trying to do.

Our cassandra systems data is on an EBS volume so we can take snapshots of the 
data and create volumes based on those snapshots and restore them where we want 
to.

The snapshot process 

Step 1
Login to  the cassandra node.

Step 2
Run nodetool clearsnapshot

Step 3
Run nodetool snapshot

Step 4
Take EBS snapshot

The above steps are performed only after the previous command returns.

Restore Process

Step 1
Remove data/system, commit_log and the saved_caches data/keyspace/* 
(excluding the snapshot directory)

Step 2
Move all snapshot files into their respective KS/CF locations

Step 3
Start Cassandra

Step 4 
Create the schema

Step 5
Look at the log.  This is where I find a corrupted sstable in our keyspace (not 
system).

Trouble shooting

I suspected a race condition so I did the following:

I inserted a sleep for 60 seconds after issuing “nodetool clearsnapshot” 
I inserted a sleep for 60 seconds after issuing “nodetool snapshot”

Took the snapshot
Restored the snapshot as stated above following those same steps.
It worked with no problem at all.

So my assumption is that Cassandra is doing a few more things after the 
“nodetool snapshot” returns.

Now that you know what is going on, I have my question.

How can I tell when a snapshot is fully complete so I do not have corrupted 
SSTables?

I can reproduce this 100% of the time.

Thanks for your help


Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Robert Coli
On Fri, Mar 28, 2014 at 11:15 AM, Russ Lavoie ussray...@yahoo.com wrote:

 We are using cassandra 1.2.10 (With JNA installed) on ubuntu 12.04.3 and
 are running our instances in Amazon Web Services.



 Our cassandra systems data is on an EBS volume


Best practice for Cassandra on AWS is to run on ephemeral stripe, not EBS.


 so we can take snapshots of the data and create volumes based on those
 snapshots and restore them where we want to.


https://github.com/synack/tablesnap

?


 How can I tell when a snapshot is fully complete so I do not have
 corrupted SSTables?


SStables are immutable after they are created. I'm not sure how you're
getting a snapshot that has corrupted SSTables in it. If you can repro
reliably, file a JIRA on issues.apache.org.

=Rob


Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Russ Lavoie
Thank you for your quick response.

Is there a way to tell when a snapshot is completely done?



On Friday, March 28, 2014 1:30 PM, Robert Coli rc...@eventbrite.com wrote:
 
On Fri, Mar 28, 2014 at 11:15 AM, Russ Lavoie ussray...@yahoo.com wrote:

We are using cassandra 1.2.10 (With JNA installed) on ubuntu 12.04.3 and are 
running our instances in Amazon Web Services.
 
Our cassandra systems data is on an EBS volume

Best practice for Cassandra on AWS is to run on ephemeral stripe, not EBS.
 
so we can take snapshots of the data and create volumes based on those 
snapshots and restore them where we want to.

https://github.com/synack/tablesnap



?
 
How can I tell when a snapshot is fully complete so I do not have corrupted 
SSTables?

SStables are immutable after they are created. I'm not sure how you're getting 
a snapshot that has corrupted SSTables in it. If you can repro reliably, file a 
JIRA on issues.apache.org.

=Rob

Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Laing, Michael
In your step 4, be sure you create a consistent EBS snapshot. You may have
pieces of your sstables that have not actually been flushed all the way to
EBS.

See https://github.com/alestic/ec2-consistent-snapshot

ml


On Fri, Mar 28, 2014 at 3:21 PM, Russ Lavoie ussray...@yahoo.com wrote:

 Thank you for your quick response.

 Is there a way to tell when a snapshot is completely done?


   On Friday, March 28, 2014 1:30 PM, Robert Coli rc...@eventbrite.com
 wrote:
  On Fri, Mar 28, 2014 at 11:15 AM, Russ Lavoie ussray...@yahoo.comwrote:

 We are using cassandra 1.2.10 (With JNA installed) on ubuntu 12.04.3 and
 are running our instances in Amazon Web Services.



  Our cassandra systems data is on an EBS volume


 Best practice for Cassandra on AWS is to run on ephemeral stripe, not EBS.


  so we can take snapshots of the data and create volumes based on those
 snapshots and restore them where we want to.


 https://github.com/synack/tablesnap


 ?


  How can I tell when a snapshot is fully complete so I do not have
 corrupted SSTables?


 SStables are immutable after they are created. I'm not sure how you're
 getting a snapshot that has corrupted SSTables in it. If you can repro
 reliably, file a JIRA on issues.apache.org.

 =Rob






Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Robert Coli
On Fri, Mar 28, 2014 at 12:21 PM, Russ Lavoie ussray...@yahoo.com wrote:

 Thank you for your quick response.

 Is there a way to tell when a snapshot is completely done?


IIRC, the JMX call blocks until the snapshot completes. It should be done
when nodetool returns.

=Rob


Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Russ Lavoie
Robert,

That is what I thought as well.  But apparently something is happening.  The 
only way I can get away with doing this is adding a sleep 60 right after the 
nodetool snapshot is executed.  I can reproduce this 100% of the time by not 
issuing a sleep after nodetool snapshot.

This is the error.

ERROR [SSTableBatchOpen:1] 2014-03-28 17:08:14,290 CassandraDaemon.java (line 
191) Exception in thread Thread[SSTableBatchOpen:1,5,main]
org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.EOFException
at 
org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:108)
at 
org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:63)
at 
org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42)
at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:407)
at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:198)
at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:157)
at org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:262)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.io.EOFException
at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
at java.io.DataInputStream.readUTF(DataInputStream.java:589)
at java.io.DataInputStream.readUTF(DataInputStream.java:564)
at 
org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:83)
... 11 more



On Friday, March 28, 2014 2:38 PM, Robert Coli rc...@eventbrite.com wrote:
 
On Fri, Mar 28, 2014 at 12:21 PM, Russ Lavoie ussray...@yahoo.com wrote:

Thank you for your quick response.


Is there a way to tell when a snapshot is completely done?

IIRC, the JMX call blocks until the snapshot completes. It should be done when 
nodetool returns.


=Rob

Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Jonathan Haddad
I have a nagging memory of reading about issues with virtualization and not
actually having durable versions of your data even after an fsync (within
the VM).  Googling around lead me to this post:
http://petercai.com/virtualization-is-bad-for-database-integrity/

It's possible you're hitting this issue, with with the virtualization
layer, or with EBS itself.  Just a shot in the dark though, other people
would likely know much more than I.



On Fri, Mar 28, 2014 at 12:50 PM, Russ Lavoie ussray...@yahoo.com wrote:

 Robert,

 That is what I thought as well.  But apparently something is happening.
  The only way I can get away with doing this is adding a sleep 60 right
 after the nodetool snapshot is executed.  I can reproduce this 100% of the
 time by not issuing a sleep after nodetool snapshot.

 This is the error.

 ERROR [SSTableBatchOpen:1] 2014-03-28 17:08:14,290 CassandraDaemon.java
 (line 191) Exception in thread Thread[SSTableBatchOpen:1,5,main]
 org.apache.cassandra.io.sstable.CorruptSSTableException:
 java.io.EOFException
 at
 org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:108)
 at
 org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:63)
 at
 org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42)
 at
 org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:407)
 at
 org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:198)
 at
 org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:157)
 at
 org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:262)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.io.EOFException
 at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
 at java.io.DataInputStream.readUTF(DataInputStream.java:589)
 at java.io.DataInputStream.readUTF(DataInputStream.java:564)
 at
 org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:83)
  ... 11 more


   On Friday, March 28, 2014 2:38 PM, Robert Coli rc...@eventbrite.com
 wrote:
  On Fri, Mar 28, 2014 at 12:21 PM, Russ Lavoie ussray...@yahoo.comwrote:

 Thank you for your quick response.

 Is there a way to tell when a snapshot is completely done?


 IIRC, the JMX call blocks until the snapshot completes. It should be done
 when nodetool returns.

 =Rob





-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade


Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Jonathan Haddad
I will +1 the recommendation on using tablesnap over EBS.  S3 is at least
predictable.

Additionally, from a practical standpoint, you may want to back up your
sstables somewhere.  If you use S3, it's easy to pull just the new tables
out via aws-cli tools (s3 sync), to your remote, non-aws server, and not
incur the overhead of routinely backing up the entire dataset.  For a non
trivial database, this matters quite a bit.


On Fri, Mar 28, 2014 at 1:21 PM, Laing, Michael
michael.la...@nytimes.comwrote:

 As I tried to say, EBS snapshots require much care or you get corruption
 such as you have encountered.

 Does Cassandra quiesce the file system after a snapshot using fsfreeze or
 xfs_freeze? Somehow I doubt it...


 On Fri, Mar 28, 2014 at 4:17 PM, Jonathan Haddad j...@jonhaddad.comwrote:

 I have a nagging memory of reading about issues with virtualization and
 not actually having durable versions of your data even after an fsync
 (within the VM).  Googling around lead me to this post:
 http://petercai.com/virtualization-is-bad-for-database-integrity/

 It's possible you're hitting this issue, with with the virtualization
 layer, or with EBS itself.  Just a shot in the dark though, other people
 would likely know much more than I.



 On Fri, Mar 28, 2014 at 12:50 PM, Russ Lavoie ussray...@yahoo.comwrote:

 Robert,

 That is what I thought as well.  But apparently something is happening.
  The only way I can get away with doing this is adding a sleep 60 right
 after the nodetool snapshot is executed.  I can reproduce this 100% of the
 time by not issuing a sleep after nodetool snapshot.

 This is the error.

 ERROR [SSTableBatchOpen:1] 2014-03-28 17:08:14,290 CassandraDaemon.java
 (line 191) Exception in thread Thread[SSTableBatchOpen:1,5,main]
 org.apache.cassandra.io.sstable.CorruptSSTableException:
 java.io.EOFException
 at
 org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:108)
 at
 org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:63)
  at
 org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42)
 at
 org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:407)
  at
 org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:198)
 at
 org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:157)
 at
 org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:262)
 at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.io.EOFException
 at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
 at java.io.DataInputStream.readUTF(DataInputStream.java:589)
 at java.io.DataInputStream.readUTF(DataInputStream.java:564)
 at
 org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:83)
  ... 11 more


   On Friday, March 28, 2014 2:38 PM, Robert Coli rc...@eventbrite.com
 wrote:
  On Fri, Mar 28, 2014 at 12:21 PM, Russ Lavoie ussray...@yahoo.comwrote:

 Thank you for your quick response.

 Is there a way to tell when a snapshot is completely done?


 IIRC, the JMX call blocks until the snapshot completes. It should be
 done when nodetool returns.

 =Rob





 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade





-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade


Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Laing, Michael
As I tried to say, EBS snapshots require much care or you get corruption
such as you have encountered.

Does Cassandra quiesce the file system after a snapshot using fsfreeze or
xfs_freeze? Somehow I doubt it...


On Fri, Mar 28, 2014 at 4:17 PM, Jonathan Haddad j...@jonhaddad.com wrote:

 I have a nagging memory of reading about issues with virtualization and
 not actually having durable versions of your data even after an fsync
 (within the VM).  Googling around lead me to this post:
 http://petercai.com/virtualization-is-bad-for-database-integrity/

 It's possible you're hitting this issue, with with the virtualization
 layer, or with EBS itself.  Just a shot in the dark though, other people
 would likely know much more than I.



 On Fri, Mar 28, 2014 at 12:50 PM, Russ Lavoie ussray...@yahoo.com wrote:

 Robert,

 That is what I thought as well.  But apparently something is happening.
  The only way I can get away with doing this is adding a sleep 60 right
 after the nodetool snapshot is executed.  I can reproduce this 100% of the
 time by not issuing a sleep after nodetool snapshot.

 This is the error.

 ERROR [SSTableBatchOpen:1] 2014-03-28 17:08:14,290 CassandraDaemon.java
 (line 191) Exception in thread Thread[SSTableBatchOpen:1,5,main]
 org.apache.cassandra.io.sstable.CorruptSSTableException:
 java.io.EOFException
 at
 org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:108)
 at
 org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:63)
  at
 org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42)
 at
 org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:407)
  at
 org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:198)
 at
 org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:157)
 at
 org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:262)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.io.EOFException
 at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
 at java.io.DataInputStream.readUTF(DataInputStream.java:589)
 at java.io.DataInputStream.readUTF(DataInputStream.java:564)
 at
 org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:83)
  ... 11 more


   On Friday, March 28, 2014 2:38 PM, Robert Coli rc...@eventbrite.com
 wrote:
  On Fri, Mar 28, 2014 at 12:21 PM, Russ Lavoie ussray...@yahoo.comwrote:

 Thank you for your quick response.

 Is there a way to tell when a snapshot is completely done?


 IIRC, the JMX call blocks until the snapshot completes. It should be done
 when nodetool returns.

 =Rob





 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade



Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Laing, Michael
+1 for tablesnap


On Fri, Mar 28, 2014 at 4:28 PM, Jonathan Haddad j...@jonhaddad.com wrote:

 I will +1 the recommendation on using tablesnap over EBS.  S3 is at least
 predictable.

 Additionally, from a practical standpoint, you may want to back up your
 sstables somewhere.  If you use S3, it's easy to pull just the new tables
 out via aws-cli tools (s3 sync), to your remote, non-aws server, and not
 incur the overhead of routinely backing up the entire dataset.  For a non
 trivial database, this matters quite a bit.


 On Fri, Mar 28, 2014 at 1:21 PM, Laing, Michael michael.la...@nytimes.com
  wrote:

 As I tried to say, EBS snapshots require much care or you get corruption
 such as you have encountered.

 Does Cassandra quiesce the file system after a snapshot using fsfreeze or
 xfs_freeze? Somehow I doubt it...


 On Fri, Mar 28, 2014 at 4:17 PM, Jonathan Haddad j...@jonhaddad.comwrote:

 I have a nagging memory of reading about issues with virtualization and
 not actually having durable versions of your data even after an fsync
 (within the VM).  Googling around lead me to this post:
 http://petercai.com/virtualization-is-bad-for-database-integrity/

 It's possible you're hitting this issue, with with the virtualization
 layer, or with EBS itself.  Just a shot in the dark though, other people
 would likely know much more than I.



 On Fri, Mar 28, 2014 at 12:50 PM, Russ Lavoie ussray...@yahoo.comwrote:

 Robert,

 That is what I thought as well.  But apparently something is happening.
  The only way I can get away with doing this is adding a sleep 60 right
 after the nodetool snapshot is executed.  I can reproduce this 100% of the
 time by not issuing a sleep after nodetool snapshot.

 This is the error.

 ERROR [SSTableBatchOpen:1] 2014-03-28 17:08:14,290 CassandraDaemon.java
 (line 191) Exception in thread Thread[SSTableBatchOpen:1,5,main]
 org.apache.cassandra.io.sstable.CorruptSSTableException:
 java.io.EOFException
 at
 org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:108)
 at
 org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:63)
  at
 org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42)
 at
 org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:407)
  at
 org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:198)
 at
 org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:157)
 at
 org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:262)
 at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.io.EOFException
 at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
 at java.io.DataInputStream.readUTF(DataInputStream.java:589)
 at java.io.DataInputStream.readUTF(DataInputStream.java:564)
 at
 org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:83)
  ... 11 more


   On Friday, March 28, 2014 2:38 PM, Robert Coli rc...@eventbrite.com
 wrote:
  On Fri, Mar 28, 2014 at 12:21 PM, Russ Lavoie ussray...@yahoo.comwrote:

 Thank you for your quick response.

 Is there a way to tell when a snapshot is completely done?


 IIRC, the JMX call blocks until the snapshot completes. It should be
 done when nodetool returns.

 =Rob





 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade





 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade



Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Jonathan Haddad
Another thing to keep in mind is that if you are hitting the issue I
described, waiting 60 seconds will not absolutely solve your problem, it
will only make it less likely to occur.  If a memtable has been partially
flushed at the 60 second mark you will end up with the same corrupt sstable.


On Fri, Mar 28, 2014 at 1:32 PM, Laing, Michael
michael.la...@nytimes.comwrote:

 +1 for tablesnap


 On Fri, Mar 28, 2014 at 4:28 PM, Jonathan Haddad j...@jonhaddad.comwrote:

 I will +1 the recommendation on using tablesnap over EBS.  S3 is at least
 predictable.

 Additionally, from a practical standpoint, you may want to back up your
 sstables somewhere.  If you use S3, it's easy to pull just the new tables
 out via aws-cli tools (s3 sync), to your remote, non-aws server, and not
 incur the overhead of routinely backing up the entire dataset.  For a non
 trivial database, this matters quite a bit.


 On Fri, Mar 28, 2014 at 1:21 PM, Laing, Michael 
 michael.la...@nytimes.com wrote:

 As I tried to say, EBS snapshots require much care or you get corruption
 such as you have encountered.

 Does Cassandra quiesce the file system after a snapshot using fsfreeze
 or xfs_freeze? Somehow I doubt it...


 On Fri, Mar 28, 2014 at 4:17 PM, Jonathan Haddad j...@jonhaddad.comwrote:

 I have a nagging memory of reading about issues with virtualization and
 not actually having durable versions of your data even after an fsync
 (within the VM).  Googling around lead me to this post:
 http://petercai.com/virtualization-is-bad-for-database-integrity/

 It's possible you're hitting this issue, with with the virtualization
 layer, or with EBS itself.  Just a shot in the dark though, other people
 would likely know much more than I.



 On Fri, Mar 28, 2014 at 12:50 PM, Russ Lavoie ussray...@yahoo.comwrote:

 Robert,

 That is what I thought as well.  But apparently something is
 happening.  The only way I can get away with doing this is adding a sleep
 60 right after the nodetool snapshot is executed.  I can reproduce this
 100% of the time by not issuing a sleep after nodetool snapshot.

 This is the error.

 ERROR [SSTableBatchOpen:1] 2014-03-28 17:08:14,290
 CassandraDaemon.java (line 191) Exception in thread
 Thread[SSTableBatchOpen:1,5,main]
 org.apache.cassandra.io.sstable.CorruptSSTableException:
 java.io.EOFException
 at
 org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:108)
 at
 org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:63)
  at
 org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42)
 at
 org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:407)
  at
 org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:198)
 at
 org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:157)
 at
 org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:262)
 at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.io.EOFException
 at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
 at java.io.DataInputStream.readUTF(DataInputStream.java:589)
 at java.io.DataInputStream.readUTF(DataInputStream.java:564)
 at
 org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:83)
  ... 11 more


   On Friday, March 28, 2014 2:38 PM, Robert Coli rc...@eventbrite.com
 wrote:
  On Fri, Mar 28, 2014 at 12:21 PM, Russ Lavoie ussray...@yahoo.comwrote:

 Thank you for your quick response.

 Is there a way to tell when a snapshot is completely done?


 IIRC, the JMX call blocks until the snapshot completes. It should be
 done when nodetool returns.

 =Rob





 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade





 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade





-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade


Re: Possibly losing data with corrupted SSTables

2014-03-14 Thread Robert Coli
On Wed, Feb 12, 2014 at 9:20 AM, Francisco Nogueira Calmon Sobral 
fsob...@igcorp.com.br wrote:

 I've removed the corrupted sstables and 'nodetool repair' ran successfully
 for the column family. I'm not sure whether or not we've lost data.


If you read/write at CL.ONE, there is a non-zero chance that you have lost
data. In practice, this chance is pretty low unless you constantly drop
mutation messages or have a RF of under 3.

=Rob


Re: Possibly losing data with corrupted SSTables

2014-02-12 Thread Francisco Nogueira Calmon Sobral
Hi, Rahul.

I've removed the corrupted sstables and 'nodetool repair' ran successfully for 
the column family. I'm not sure whether or not we've lost data.

Best regards,
Francisco Sobral


On Jan 30, 2014, at 3:58 PM, Rahul Menon ra...@apigee.com wrote:

 Yes should delete all files related to cfname-ib-num-extension.db
 
 Run a repair after deletion
 
 
 On Thu, Jan 30, 2014 at 10:17 PM, Francisco Nogueira Calmon Sobral 
 fsob...@igcorp.com.br wrote:
 Ok. I'll try this idea with one sstable. But, should I delete all the files 
 associated with it? I mean, there is a difference in the number of files 
 between the BAD sstable and a GOOD one, as I've already shown:
 
 BAD
 --
 -rw-r--r-- 8 cassandra cassandra 991M Nov  8 15:11 
 Sessions-Users-ib-2516-Data.db
 -rw-r--r-- 8 cassandra cassandra 703M Nov  8 15:11 
 Sessions-Users-ib-2516-Index.db
 -rw-r--r-- 8 cassandra cassandra 5.3M Nov 13 11:42 
 Sessions-Users-ib-2516-Summary.db
 
 GOOD
 -
 -rw-r--r-- 1 cassandra cassandra  22K Jan 15 10:50 
 Sessions-Users-ic-2933-CompressionInfo.db
 -rw-r--r-- 1 cassandra cassandra 106M Jan 15 10:50 
 Sessions-Users-ic-2933-Data.db
 -rw-r--r-- 1 cassandra cassandra 2.2M Jan 15 10:50 
 Sessions-Users-ic-2933-Filter.db
 -rw-r--r-- 1 cassandra cassandra  76M Jan 15 10:50 
 Sessions-Users-ic-2933-Index.db
 -rw-r--r-- 1 cassandra cassandra 4.3K Jan 15 10:50 
 Sessions-Users-ic-2933-Statistics.db
 -rw-r--r-- 1 cassandra cassandra 574K Jan 15 10:50 
 Sessions-Users-ic-2933-Summary.db
 -rw-r--r-- 1 cassandra cassandra   79 Jan 15 10:50 
 Sessions-Users-ic-2933-TOC.txt
 
 Should I delete those 3 files? Should I run nodetool refresh after the 
 operation?
 
 Best regards,
 Francisco.
 
 On Jan 30, 2014, at 2:02 PM, Rahul Menon ra...@apigee.com wrote:
 
  Looks like the sstables are corrupt. I dont believe there is a method to 
  recover those sstables. I would delete them and run a repair to ensure data 
  consistency.
 
  Rahul
 
 
  On Wed, Jan 29, 2014 at 11:29 PM, Francisco Nogueira Calmon Sobral 
  fsob...@igcorp.com.br wrote:
  Hi, Rahul.
 
  I've run nodetool upgradesstable only in the problematic CF. It throwed the 
  following exception:
 
  Error occurred while upgrading the sstables for keyspace Sessions
  java.util.concurrent.ExecutionException: 
  org.apache.cassandra.io.sstable.CorruptSSTableException: 
  java.io.IOException: dataSize of 3622081913630118729 starting at 32906 
  would be larger than file 
  /mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db length 
  1038
  893416
  at java.util.concurrent.FutureTask.report(FutureTask.java:122)
  at java.util.concurrent.FutureTask.get(FutureTask.java:188)
  at 
  org.apache.cassandra.db.compaction.CompactionManager.performAllSSTableOperation(CompactionManager.java:271)
  at 
  org.apache.cassandra.db.compaction.CompactionManager.performSSTableRewrite(CompactionManager.java:287)
  at 
  org.apache.cassandra.db.ColumnFamilyStore.sstablesRewrite(ColumnFamilyStore.java:977)
  at 
  org.apache.cassandra.service.StorageService.upgradeSSTables(StorageService.java:2191)
  … …
  Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: 
  java.io.IOException: dataSize of 3622081913630118729 starting at 32906 
  would be larger than file 
  /mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db length 
  1038893416
  at 
  org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:167)
  at 
  org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:83)
  at 
  org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:69)
  at 
  org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:180)
  at 
  org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:155)
  at 
  org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:142)
  at 
  org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:38)
  at 
  org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:202)
  at 
  com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
  at 
  com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
  at 
  org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:134)
  at 
  org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
  at 
  org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
  at 
  org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
  at 
  org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60

Re: Possibly losing data with corrupted SSTables

2014-02-12 Thread sankalp kohli
You might want to look at this JIRA i filed today
CASSANDRA-6696 https://issues.apache.org/jira/browse/CASSANDRA-6696

You are good if you are fine with data reappearing.


On Wed, Feb 12, 2014 at 9:20 AM, Francisco Nogueira Calmon Sobral 
fsob...@igcorp.com.br wrote:

 Hi, Rahul.

 I've removed the corrupted sstables and 'nodetool repair' ran successfully
 for the column family. I'm not sure whether or not we've lost data.

 Best regards,
 Francisco Sobral


 On Jan 30, 2014, at 3:58 PM, Rahul Menon ra...@apigee.com wrote:

 Yes should delete all files related to cfname-ib-num-extension.db

 Run a repair after deletion


 On Thu, Jan 30, 2014 at 10:17 PM, Francisco Nogueira Calmon Sobral 
 fsob...@igcorp.com.br wrote:

 Ok. I'll try this idea with one sstable. But, should I delete all the
 files associated with it? I mean, there is a difference in the number of
 files between the BAD sstable and a GOOD one, as I've already shown:

 BAD
 --
 -rw-r--r-- 8 cassandra cassandra 991M Nov  8 15:11
 Sessions-Users-ib-2516-Data.db
 -rw-r--r-- 8 cassandra cassandra 703M Nov  8 15:11
 Sessions-Users-ib-2516-Index.db
 -rw-r--r-- 8 cassandra cassandra 5.3M Nov 13 11:42
 Sessions-Users-ib-2516-Summary.db

 GOOD
 -
 -rw-r--r-- 1 cassandra cassandra  22K Jan 15 10:50
 Sessions-Users-ic-2933-CompressionInfo.db
 -rw-r--r-- 1 cassandra cassandra 106M Jan 15 10:50
 Sessions-Users-ic-2933-Data.db
 -rw-r--r-- 1 cassandra cassandra 2.2M Jan 15 10:50
 Sessions-Users-ic-2933-Filter.db
 -rw-r--r-- 1 cassandra cassandra  76M Jan 15 10:50
 Sessions-Users-ic-2933-Index.db
 -rw-r--r-- 1 cassandra cassandra 4.3K Jan 15 10:50
 Sessions-Users-ic-2933-Statistics.db
 -rw-r--r-- 1 cassandra cassandra 574K Jan 15 10:50
 Sessions-Users-ic-2933-Summary.db
 -rw-r--r-- 1 cassandra cassandra   79 Jan 15 10:50
 Sessions-Users-ic-2933-TOC.txt

 Should I delete those 3 files? Should I run nodetool refresh after the
 operation?

 Best regards,
 Francisco.

 On Jan 30, 2014, at 2:02 PM, Rahul Menon ra...@apigee.com wrote:

  Looks like the sstables are corrupt. I dont believe there is a method
 to recover those sstables. I would delete them and run a repair to ensure
 data consistency.
 
  Rahul
 
 
  On Wed, Jan 29, 2014 at 11:29 PM, Francisco Nogueira Calmon Sobral 
 fsob...@igcorp.com.br wrote:
  Hi, Rahul.
 
  I've run nodetool upgradesstable only in the problematic CF. It throwed
 the following exception:
 
  Error occurred while upgrading the sstables for keyspace Sessions
  java.util.concurrent.ExecutionException:
 org.apache.cassandra.io.sstable.CorruptSSTableException:
 java.io.IOException: dataSize of 3622081913630118729 starting at 32906
 would be larger than file
 /mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db length
 1038
  893416
  at java.util.concurrent.FutureTask.report(FutureTask.java:122)
  at java.util.concurrent.FutureTask.get(FutureTask.java:188)
  at
 org.apache.cassandra.db.compaction.CompactionManager.performAllSSTableOperation(CompactionManager.java:271)
  at
 org.apache.cassandra.db.compaction.CompactionManager.performSSTableRewrite(CompactionManager.java:287)
  at
 org.apache.cassandra.db.ColumnFamilyStore.sstablesRewrite(ColumnFamilyStore.java:977)
  at
 org.apache.cassandra.service.StorageService.upgradeSSTables(StorageService.java:2191)
  ... ...
  Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException:
 java.io.IOException: dataSize of 3622081913630118729 starting at 32906
 would be larger than file
 /mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db length
 1038893416
  at
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:167)
  at
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:83)
  at
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:69)
  at
 org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:180)
  at
 org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:155)
  at
 org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:142)
  at
 org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:38)
  at
 org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:202)
  at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
  at
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
  at
 org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:134)
  at
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
  at
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28

Re: Possibly losing data with corrupted SSTables

2014-01-30 Thread Rahul Menon
Looks like the sstables are corrupt. I dont believe there is a method to
recover those sstables. I would delete them and run a repair to ensure data
consistency.

Rahul


On Wed, Jan 29, 2014 at 11:29 PM, Francisco Nogueira Calmon Sobral 
fsob...@igcorp.com.br wrote:

 Hi, Rahul.

 I've run nodetool upgradesstable only in the problematic CF. It throwed
 the following exception:

 Error occurred while upgrading the sstables for keyspace Sessions
 java.util.concurrent.ExecutionException:
 org.apache.cassandra.io.sstable.CorruptSSTableException:
 java.io.IOException: dataSize of 3622081913630118729 starting at 32906
 would be larger than file
 /mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db length
 1038
 893416
 at java.util.concurrent.FutureTask.report(FutureTask.java:122)
 at java.util.concurrent.FutureTask.get(FutureTask.java:188)
 at
 org.apache.cassandra.db.compaction.CompactionManager.performAllSSTableOperation(CompactionManager.java:271)
 at
 org.apache.cassandra.db.compaction.CompactionManager.performSSTableRewrite(CompactionManager.java:287)
 at
 org.apache.cassandra.db.ColumnFamilyStore.sstablesRewrite(ColumnFamilyStore.java:977)
 at
 org.apache.cassandra.service.StorageService.upgradeSSTables(StorageService.java:2191)
 ... ...
 Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException:
 java.io.IOException: dataSize of 3622081913630118729 starting at 32906
 would be larger than file
 /mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db length
 1038893416
 at
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:167)
 at
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:83)
 at
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:69)
 at
 org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:180)
 at
 org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:155)
 at
 org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:142)
 at
 org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:38)
 at
 org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:202)
 at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 at
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
 at
 org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:134)
 at
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
 at
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at
 org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
 at
 org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
 at
 org.apache.cassandra.db.compaction.CompactionManager$4.perform(CompactionManager.java:301)
 at
 org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:250)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 ... 3 more
 Caused by: java.io.IOException: dataSize of 3622081913630118729 starting
 at 32906 would be larger than file
 /mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db length
 1038893416
 at
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:123)
 ... 20 more


 Regards,
 Francisco


 On Jan 29, 2014, at 3:38 PM, Rahul Menon ra...@apigee.com wrote:

  Francisco,
 
  the sstables with *-ib-* is something that was from a previous version
 of c*. The *-ib-* naming convention started at c* 1.2.1 but 1.2.10 onwards
 im sure it has the *-ic-* convention. You could try running a nodetool
 sstableupgrade which should ideally upgrade the sstables with the *-ib-* to
 *-ic-*.
 
  Rahul
 
  On Wed, Jan 29, 2014 at 12:55 AM, Francisco Nogueira Calmon Sobral 
 fsob...@igcorp.com.br wrote:
  Dear experts,
 
  We are facing a annoying problem in our cluster.
 
  We have 9 amazon extra large linux nodes, running Cassandra 1.2.11.
 
  The short story is that after moving the data from one cluster to
 another, we've been unable to run 'nodetool repair'. It get stuck due to a
 CorruptSSTableException in some nodes and CFs. After looking at some
 problematic CFs, we observed that some of them have root permissions,
 instead of cassandra permissions. Also, their names are different from the
 'good' ones as we can see below:
 
  BAD
  --
  -rw-r--r-- 8 cassandra cassandra 991M Nov  8 15:11
 Sessions-Users-ib-2516-Data.db
  -rw-r--r-- 8 cassandra cassandra 703M Nov  8 15:11
 Sessions-Users-ib-2516-Index.db
  -rw-r--r-- 8 cassandra cassandra 5.3M Nov 13 

Re: Possibly losing data with corrupted SSTables

2014-01-30 Thread Francisco Nogueira Calmon Sobral
Ok. I'll try this idea with one sstable. But, should I delete all the files 
associated with it? I mean, there is a difference in the number of files 
between the BAD sstable and a GOOD one, as I've already shown:

BAD
--
-rw-r--r-- 8 cassandra cassandra 991M Nov  8 15:11 
Sessions-Users-ib-2516-Data.db
-rw-r--r-- 8 cassandra cassandra 703M Nov  8 15:11 
Sessions-Users-ib-2516-Index.db
-rw-r--r-- 8 cassandra cassandra 5.3M Nov 13 11:42 
Sessions-Users-ib-2516-Summary.db

GOOD
-
-rw-r--r-- 1 cassandra cassandra  22K Jan 15 10:50 
Sessions-Users-ic-2933-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 106M Jan 15 10:50 
Sessions-Users-ic-2933-Data.db
-rw-r--r-- 1 cassandra cassandra 2.2M Jan 15 10:50 
Sessions-Users-ic-2933-Filter.db
-rw-r--r-- 1 cassandra cassandra  76M Jan 15 10:50 
Sessions-Users-ic-2933-Index.db
-rw-r--r-- 1 cassandra cassandra 4.3K Jan 15 10:50 
Sessions-Users-ic-2933-Statistics.db
-rw-r--r-- 1 cassandra cassandra 574K Jan 15 10:50 
Sessions-Users-ic-2933-Summary.db
-rw-r--r-- 1 cassandra cassandra   79 Jan 15 10:50 
Sessions-Users-ic-2933-TOC.txt

Should I delete those 3 files? Should I run nodetool refresh after the 
operation?

Best regards,
Francisco.

On Jan 30, 2014, at 2:02 PM, Rahul Menon ra...@apigee.com wrote:

 Looks like the sstables are corrupt. I dont believe there is a method to 
 recover those sstables. I would delete them and run a repair to ensure data 
 consistency.
 
 Rahul  
 
 
 On Wed, Jan 29, 2014 at 11:29 PM, Francisco Nogueira Calmon Sobral 
 fsob...@igcorp.com.br wrote:
 Hi, Rahul.
 
 I've run nodetool upgradesstable only in the problematic CF. It throwed the 
 following exception:
 
 Error occurred while upgrading the sstables for keyspace Sessions
 java.util.concurrent.ExecutionException: 
 org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: 
 dataSize of 3622081913630118729 starting at 32906 would be larger than file 
 /mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db length 1038
 893416
 at java.util.concurrent.FutureTask.report(FutureTask.java:122)
 at java.util.concurrent.FutureTask.get(FutureTask.java:188)
 at 
 org.apache.cassandra.db.compaction.CompactionManager.performAllSSTableOperation(CompactionManager.java:271)
 at 
 org.apache.cassandra.db.compaction.CompactionManager.performSSTableRewrite(CompactionManager.java:287)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.sstablesRewrite(ColumnFamilyStore.java:977)
 at 
 org.apache.cassandra.service.StorageService.upgradeSSTables(StorageService.java:2191)
 … …
 Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: 
 java.io.IOException: dataSize of 3622081913630118729 starting at 32906 would 
 be larger than file 
 /mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db length 
 1038893416
 at 
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:167)
 at 
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:83)
 at 
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:69)
 at 
 org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:180)
 at 
 org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:155)
 at 
 org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:142)
 at 
 org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:38)
 at 
 org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:202)
 at 
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 at 
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
 at 
 org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:134)
 at 
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at 
 org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
 at 
 org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$4.perform(CompactionManager.java:301)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:250)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 ... 3 more
 Caused by: java.io.IOException: dataSize of 3622081913630118729 starting at 
 32906 would be larger than file 
 /mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db length 
 1038893416
 at 
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:123)

Re: Possibly losing data with corrupted SSTables

2014-01-30 Thread Rahul Menon
Yes should delete all files related to cfname-ib-num-extension.db

Run a repair after deletion


On Thu, Jan 30, 2014 at 10:17 PM, Francisco Nogueira Calmon Sobral 
fsob...@igcorp.com.br wrote:

 Ok. I'll try this idea with one sstable. But, should I delete all the
 files associated with it? I mean, there is a difference in the number of
 files between the BAD sstable and a GOOD one, as I've already shown:

 BAD
 --
 -rw-r--r-- 8 cassandra cassandra 991M Nov  8 15:11
 Sessions-Users-ib-2516-Data.db
 -rw-r--r-- 8 cassandra cassandra 703M Nov  8 15:11
 Sessions-Users-ib-2516-Index.db
 -rw-r--r-- 8 cassandra cassandra 5.3M Nov 13 11:42
 Sessions-Users-ib-2516-Summary.db

 GOOD
 -
 -rw-r--r-- 1 cassandra cassandra  22K Jan 15 10:50
 Sessions-Users-ic-2933-CompressionInfo.db
 -rw-r--r-- 1 cassandra cassandra 106M Jan 15 10:50
 Sessions-Users-ic-2933-Data.db
 -rw-r--r-- 1 cassandra cassandra 2.2M Jan 15 10:50
 Sessions-Users-ic-2933-Filter.db
 -rw-r--r-- 1 cassandra cassandra  76M Jan 15 10:50
 Sessions-Users-ic-2933-Index.db
 -rw-r--r-- 1 cassandra cassandra 4.3K Jan 15 10:50
 Sessions-Users-ic-2933-Statistics.db
 -rw-r--r-- 1 cassandra cassandra 574K Jan 15 10:50
 Sessions-Users-ic-2933-Summary.db
 -rw-r--r-- 1 cassandra cassandra   79 Jan 15 10:50
 Sessions-Users-ic-2933-TOC.txt

 Should I delete those 3 files? Should I run nodetool refresh after the
 operation?

 Best regards,
 Francisco.

 On Jan 30, 2014, at 2:02 PM, Rahul Menon ra...@apigee.com wrote:

  Looks like the sstables are corrupt. I dont believe there is a method to
 recover those sstables. I would delete them and run a repair to ensure data
 consistency.
 
  Rahul
 
 
  On Wed, Jan 29, 2014 at 11:29 PM, Francisco Nogueira Calmon Sobral 
 fsob...@igcorp.com.br wrote:
  Hi, Rahul.
 
  I've run nodetool upgradesstable only in the problematic CF. It throwed
 the following exception:
 
  Error occurred while upgrading the sstables for keyspace Sessions
  java.util.concurrent.ExecutionException:
 org.apache.cassandra.io.sstable.CorruptSSTableException:
 java.io.IOException: dataSize of 3622081913630118729 starting at 32906
 would be larger than file
 /mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db length
 1038
  893416
  at java.util.concurrent.FutureTask.report(FutureTask.java:122)
  at java.util.concurrent.FutureTask.get(FutureTask.java:188)
  at
 org.apache.cassandra.db.compaction.CompactionManager.performAllSSTableOperation(CompactionManager.java:271)
  at
 org.apache.cassandra.db.compaction.CompactionManager.performSSTableRewrite(CompactionManager.java:287)
  at
 org.apache.cassandra.db.ColumnFamilyStore.sstablesRewrite(ColumnFamilyStore.java:977)
  at
 org.apache.cassandra.service.StorageService.upgradeSSTables(StorageService.java:2191)
  ... ...
  Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException:
 java.io.IOException: dataSize of 3622081913630118729 starting at 32906
 would be larger than file
 /mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db length
 1038893416
  at
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:167)
  at
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:83)
  at
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:69)
  at
 org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:180)
  at
 org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:155)
  at
 org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:142)
  at
 org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:38)
  at
 org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:202)
  at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
  at
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
  at
 org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:134)
  at
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
  at
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
  at
 org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
  at
 org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
  at
 org.apache.cassandra.db.compaction.CompactionManager$4.perform(CompactionManager.java:301)
  at
 org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:250)
  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  ... 3 more
  Caused by: java.io.IOException: dataSize of 3622081913630118729 starting
 at 32906 

Possibly losing data with corrupted SSTables

2014-01-29 Thread Francisco Nogueira Calmon Sobral
Dear experts,

We are facing a annoying problem in our cluster.

We have 9 amazon extra large linux nodes, running Cassandra 1.2.11.

The short story is that after moving the data from one cluster to another, 
we've been unable to run 'nodetool repair'. It get stuck due to a 
CorruptSSTableException in some nodes and CFs. After looking at some 
problematic CFs, we observed that some of them have root permissions, instead 
of cassandra permissions. Also, their names are different from the 'good' ones 
as we can see below:

BAD
--
-rw-r--r-- 8 cassandra cassandra 991M Nov  8 15:11 
Sessions-Users-ib-2516-Data.db
-rw-r--r-- 8 cassandra cassandra 703M Nov  8 15:11 
Sessions-Users-ib-2516-Index.db
-rw-r--r-- 8 cassandra cassandra 5.3M Nov 13 11:42 
Sessions-Users-ib-2516-Summary.db

GOOD
-
-rw-r--r-- 1 cassandra cassandra  22K Jan 15 10:50 
Sessions-Users-ic-2933-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 106M Jan 15 10:50 
Sessions-Users-ic-2933-Data.db
-rw-r--r-- 1 cassandra cassandra 2.2M Jan 15 10:50 
Sessions-Users-ic-2933-Filter.db
-rw-r--r-- 1 cassandra cassandra  76M Jan 15 10:50 
Sessions-Users-ic-2933-Index.db
-rw-r--r-- 1 cassandra cassandra 4.3K Jan 15 10:50 
Sessions-Users-ic-2933-Statistics.db
-rw-r--r-- 1 cassandra cassandra 574K Jan 15 10:50 
Sessions-Users-ic-2933-Summary.db
-rw-r--r-- 1 cassandra cassandra   79 Jan 15 10:50 
Sessions-Users-ic-2933-TOC.txt


We changed the permissions back to 'cassandra' and ran 'nodetool scrub' in this 
problematic CF, but it has been running for at least two weeks (it is not 
frozen) and keeps logging many WARNs while working with the above mentioned 
SSTable:

WARN [CompactionExecutor:15] 2014-01-28 17:01:22,571 OutputHandler.java (line 
57) Non-fatal error reading row (stacktrace follows)
java.io.IOError: java.io.IOException: Impossible row size 3618452438597849419
at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:171)
at 
org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:526)
at 
org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:515)
at 
org.apache.cassandra.db.compaction.CompactionManager.access$400(CompactionManager.java:70)
at 
org.apache.cassandra.db.compaction.CompactionManager$3.perform(CompactionManager.java:280)
at 
org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:250)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.io.IOException: Impossible row size 3618452438597849419
... 10 more


1) I do not think that deleting all data of one node and running 'nodetool 
rebuild' will work, since we observed that this problem occurs in all nodes. So 
we may not be able to restore all the data. What can be done in this case?

2) Why the permissions of some sstables are 'root'? Is this problem caused by 
our manual migration of data? (see long story below)


How we ran into this?

The long story is that we've tried to move our cluster with sstableloader, but 
it was unable to load all the data correctly. Our solution was to put ALL 
cluster data into EACH new node and run 'nodetool refresh'. I performed this 
task for each node and each column family sequentially. Sometimes I had to 
rename some sstables, because they came from different nodes with the same 
name. I don't remember if I ran 'nodetool repair'  or even 'nodetool cleanup' 
in each node. Apparently, the process was successful, and (almost) all the data 
was moved.

Unfortunately, after 3 months since we moved, I am unable to perform read 
operations in some keys of some CFs. I think that some of these keys belong to 
the above mentioned sstables. 

Any insights are welcome.

Best regards,
Francisco Sobral

Re: Possibly losing data with corrupted SSTables

2014-01-29 Thread Rahul Menon
Francisco,

the sstables with *-ib-* is something that was from a previous version of
c*. The *-ib-* naming convention started at c* 1.2.1 but 1.2.10 onwards im
sure it has the *-ic-* convention. You could try running a nodetool
sstableupgrade which should ideally upgrade the sstables with the *-ib-* to
*-ic-*.

Rahul

On Wed, Jan 29, 2014 at 12:55 AM, Francisco Nogueira Calmon Sobral 
fsob...@igcorp.com.br wrote:

 Dear experts,

 We are facing a annoying problem in our cluster.

 We have 9 amazon extra large linux nodes, running Cassandra 1.2.11.

 The short story is that after moving the data from one cluster to another,
 we've been unable to run 'nodetool repair'. It get stuck due to a
 CorruptSSTableException in some nodes and CFs. After looking at some
 problematic CFs, we observed that some of them have root permissions,
 instead of cassandra permissions. Also, their names are different from the
 'good' ones as we can see below:

 BAD
 --
 -rw-r--r-- 8 cassandra cassandra 991M Nov  8 15:11
 Sessions-Users-ib-2516-Data.db
 -rw-r--r-- 8 cassandra cassandra 703M Nov  8 15:11
 Sessions-Users-ib-2516-Index.db
 -rw-r--r-- 8 cassandra cassandra 5.3M Nov 13 11:42
 Sessions-Users-ib-2516-Summary.db

 GOOD
 -
 -rw-r--r-- 1 cassandra cassandra  22K Jan 15 10:50
 Sessions-Users-ic-2933-CompressionInfo.db
 -rw-r--r-- 1 cassandra cassandra 106M Jan 15 10:50
 Sessions-Users-ic-2933-Data.db
 -rw-r--r-- 1 cassandra cassandra 2.2M Jan 15 10:50
 Sessions-Users-ic-2933-Filter.db
 -rw-r--r-- 1 cassandra cassandra  76M Jan 15 10:50
 Sessions-Users-ic-2933-Index.db
 -rw-r--r-- 1 cassandra cassandra 4.3K Jan 15 10:50
 Sessions-Users-ic-2933-Statistics.db
 -rw-r--r-- 1 cassandra cassandra 574K Jan 15 10:50
 Sessions-Users-ic-2933-Summary.db
 -rw-r--r-- 1 cassandra cassandra   79 Jan 15 10:50
 Sessions-Users-ic-2933-TOC.txt


 We changed the permissions back to 'cassandra' and ran 'nodetool scrub' in
 this problematic CF, but it has been running for at least two weeks (it is
 not frozen) and keeps logging many WARNs while working with the above
 mentioned SSTable:

 WARN [CompactionExecutor:15] 2014-01-28 17:01:22,571 OutputHandler.java
 (line 57) Non-fatal error reading row (stacktrace follows)
 java.io.IOError: java.io.IOException: Impossible row size
 3618452438597849419
 at
 org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:171)
 at
 org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:526)
 at
 org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:515)
 at
 org.apache.cassandra.db.compaction.CompactionManager.access$400(CompactionManager.java:70)
 at
 org.apache.cassandra.db.compaction.CompactionManager$3.perform(CompactionManager.java:280)
 at
 org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:250)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.io.IOException: Impossible row size 3618452438597849419
 ... 10 more


 1) I do not think that deleting all data of one node and running 'nodetool
 rebuild' will work, since we observed that this problem occurs in all
 nodes. So we may not be able to restore all the data. What can be done in
 this case?

 2) Why the permissions of some sstables are 'root'? Is this problem caused
 by our manual migration of data? (see long story below)


 How we ran into this?

 The long story is that we've tried to move our cluster with sstableloader,
 but it was unable to load all the data correctly. Our solution was to put
 ALL cluster data into EACH new node and run 'nodetool refresh'. I performed
 this task for each node and each column family sequentially. Sometimes I
 had to rename some sstables, because they came from different nodes with
 the same name. I don't remember if I ran 'nodetool repair'  or even
 'nodetool cleanup' in each node. Apparently, the process was successful,
 and (almost) all the data was moved.

 Unfortunately, after 3 months since we moved, I am unable to perform read
 operations in some keys of some CFs. I think that some of these keys belong
 to the above mentioned sstables.

 Any insights are welcome.

 Best regards,
 Francisco Sobral


Re: Possibly losing data with corrupted SSTables

2014-01-29 Thread Francisco Nogueira Calmon Sobral
Hi, Rahul.

I've run nodetool upgradesstable only in the problematic CF. It throwed the 
following exception:

Error occurred while upgrading the sstables for keyspace Sessions
java.util.concurrent.ExecutionException: 
org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: 
dataSize of 3622081913630118729 starting at 32906 would be larger than file 
/mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db length 1038
893416
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:188)
at 
org.apache.cassandra.db.compaction.CompactionManager.performAllSSTableOperation(CompactionManager.java:271)
at 
org.apache.cassandra.db.compaction.CompactionManager.performSSTableRewrite(CompactionManager.java:287)
at 
org.apache.cassandra.db.ColumnFamilyStore.sstablesRewrite(ColumnFamilyStore.java:977)
at 
org.apache.cassandra.service.StorageService.upgradeSSTables(StorageService.java:2191)
… … 
Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: 
java.io.IOException: dataSize of 3622081913630118729 starting at 32906 would be 
larger than file 
/mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db length 
1038893416
at 
org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:167)
at 
org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:83)
at 
org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:69)
at 
org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:180)
at 
org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:155)
at 
org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:142)
at 
org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:38)
at 
org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:202)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at 
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:134)
at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
at 
org.apache.cassandra.db.compaction.CompactionManager$4.perform(CompactionManager.java:301)
at 
org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:250)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
... 3 more
Caused by: java.io.IOException: dataSize of 3622081913630118729 starting at 
32906 would be larger than file 
/mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db length 
1038893416
at 
org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:123)
... 20 more


Regards,
Francisco


On Jan 29, 2014, at 3:38 PM, Rahul Menon ra...@apigee.com wrote:

 Francisco, 
 
 the sstables with *-ib-* is something that was from a previous version of c*. 
 The *-ib-* naming convention started at c* 1.2.1 but 1.2.10 onwards im sure 
 it has the *-ic-* convention. You could try running a nodetool sstableupgrade 
 which should ideally upgrade the sstables with the *-ib-* to *-ic-*. 
 
 Rahul
 
 On Wed, Jan 29, 2014 at 12:55 AM, Francisco Nogueira Calmon Sobral 
 fsob...@igcorp.com.br wrote:
 Dear experts,
 
 We are facing a annoying problem in our cluster.
 
 We have 9 amazon extra large linux nodes, running Cassandra 1.2.11.
 
 The short story is that after moving the data from one cluster to another, 
 we've been unable to run 'nodetool repair'. It get stuck due to a 
 CorruptSSTableException in some nodes and CFs. After looking at some 
 problematic CFs, we observed that some of them have root permissions, instead 
 of cassandra permissions. Also, their names are different from the 'good' 
 ones as we can see below:
 
 BAD
 --
 -rw-r--r-- 8 cassandra cassandra 991M Nov  8 15:11 
 Sessions-Users-ib-2516-Data.db
 -rw-r--r-- 8 cassandra cassandra 703M Nov  8 15:11 
 Sessions-Users-ib-2516-Index.db
 -rw-r--r-- 8 cassandra cassandra 5.3M Nov 13 11:42 
 Sessions-Users-ib-2516-Summary.db
 
 GOOD
 -
 -rw-r--r-- 1 cassandra cassandra  22K Jan 15 10:50 
 Sessions-Users-ic-2933-CompressionInfo.db
 -rw-r--r-- 1 cassandra cassandra 106M Jan 15 10:50 
 Sessions-Users-ic-2933-Data.db
 -rw-r--r-- 1 cassandra cassandra 2.2M Jan 15 10:50