[
https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mck SembWever updated CASSANDRA-3150:
-------------------------------------
Attachment: fullscan-example1.log
Here's debug from a "full scan" job. It scans data over a full year (and since
the cluster's ring range only hold 3 months of data this job guarantees a full
scan).
In the debug you see the splits.
{{`nodetool ring`}} gives
{noformat}Address DC Rack Status State Load
Owns Token
Token(bytes[55555555555555555555555555555554])
152.90.241.22 DC1 RAC1 Up Normal 16.65 GB 33.33%
Token(bytes[00])
152.90.241.23 DC2 RAC1 Up Normal 63.22 GB 33.33%
Token(bytes[2aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa])
152.90.241.24 DC1 RAC1 Up Normal 72.4 KB 33.33%
Token(bytes[55555555555555555555555555555554])
{noformat}
The problematic split ends up being
{noformat}ColumnFamilySplit{startToken='0528cbe0b2b5ff6b816c68b59973bcbc',
endToken='2aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa',
dataNodes=[cassandra02.finn.no]}{noformat}
> ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of
> whack)
> ----------------------------------------------------------------------------------
>
> Key: CASSANDRA-3150
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
> Project: Cassandra
> Issue Type: Bug
> Components: Hadoop
> Affects Versions: 0.8.4, 0.8.5
> Reporter: Mck SembWever
> Assignee: Mck SembWever
> Priority: Critical
> Fix For: 0.8.6
>
> Attachments: CASSANDRA-3150.patch, Screenshot-Counters for
> task_201109212019_1060_m_000029 - Mozilla Firefox.png, Screenshot-Hadoop map
> task list for job_201109212019_1060 on cassandra01 - Mozilla Firefox.png,
> attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log,
> fullscan-example1.log
>
>
> From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039
> {quote}
> bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
> bq. CFIF's inputSplitSize=196608
> bq. 3 map tasks (from 4013) is still running after read 25 million rows.
> bq. Can this be a bug in StorageService.getSplits(..) ?
> getSplits looks pretty foolproof to me but I guess we'd need to add
> more debug logging to rule out a bug there for sure.
> I guess the main alternative would be a bug in the recordreader paging.
> {quote}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira