Re: Missing data

Bryan Holladay Mon, 15 Jun 2015 10:14:00 -0700

Theres your problem, you're using the DataStax java driver :) I just ran
into this issue in the last week and it was incredibly frustrating. If you
are doing a simple loop on a "select * " query, then the DataStax java
driver will only process 2^31 rows (e.g. the Java Integer Max
(2,147,483,647)) before it stops w/o any error or output in the logs. The
fact that you said you only had about 2 billion rows but you are seeing
missing data is a red flag.


I found the only way around this is to do your "select *" in chunks based
on the token range (see this gist for an example:
https://gist.github.com/baholladay/21eb4c61ea8905302195 )
Just loop for every 100million rows and make a new query "select * from
TABLE where token(key) > lastToken"

Thanks,
Bryan




On Mon, Jun 15, 2015 at 12:50 PM, Jean Tremblay <
jean.tremb...@zen-innovations.com> wrote:

>  Dear all,
>
>  I identified a bit more closely the root cause of my missing data.
>
>  The problem is occurring when I use
>
>   <dependency>
> <groupId>com.datastax.cassandra</groupId>
> <artifactId>cassandra-driver-core</artifactId>
>  <version>2.1.6</version>
>  </dependency>
>
>  on my client against Cassandra 2.1.6.
>
>  I did not have the problem when I was using the driver 2.1.4 with C*
> 2.1.4.
> Interestingly enough I don’t have the problem with the driver 2.1.4 with
> C* 2.1.6.  !!!!!!
>
>  So as far as I can locate the problem, I would say that the version
> 2.1.6 of the driver is not working properly and is loosing some of my
> records.!!!
>
>  ——————
>
>  As far as my tombstones are concerned I don’t understand their origin.
> I removed all location in my code where I delete items, and I do not use
> TTL anywhere ( I don’t need this feature in my project).
>
>  And yet I have many tombstones building up.
>
>  Is there another origin for tombstone beside TTL, and deleting items?
> Could the compaction of LeveledCompactionStrategy be the origin of them?
>
>  @Carlos thanks for your guidance.
>
>  Kind regards
>
>  Jean
>
>
>
>  On 15 Jun 2015, at 11:17 , Carlos Rolo <r...@pythian.com> wrote:
>
>  Hi Jean,
>
>  The problem of that Warning is that you are reading too many tombstones
> per request.
>
>  If you do have Tombstones without doing DELETE it because you probably
> TTL'ed the data when inserting (By mistake? Or did you set
> default_time_to_live in your table?). You can use nodetool cfstats to see
> how many tombstones per read slice you have. This is, probably, also the
> cause of your missing data. Data was tombstoned, so it is not available.
>
>
>
>    Regards,
>
>  Carlos Juzarte Rolo
> Cassandra Consultant
>
> Pythian - Love your data
>
>  rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
> <http://linkedin.com/in/carlosjuzarterolo>*
> Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
> www.pythian.com
>
> On Mon, Jun 15, 2015 at 10:54 AM, Jean Tremblay <
> jean.tremb...@zen-innovations.com> wrote:
>
>> Hi,
>>
>>  I have reloaded the data in my cluster of 3 nodes RF: 2.
>> I have loaded about 2 billion rows in one table.
>> I use LeveledCompactionStrategy on my table.
>> I use version 2.1.6.
>> I use the default cassandra.yaml, only the ip address for seeds and
>> throughput has been change.
>>
>>  I loaded my data with simple insert statements. This took a bit more
>> than one day to load the data… and one more day to compact the data on all
>> nodes.
>> For me this is quite acceptable since I should not be doing this again.
>> I have done this with previous versions like 2.1.3 and others and I
>> basically had absolutely no problems.
>>
>>  Now I read the log files on the client side, there I see no warning and
>> no errors.
>> On the nodes side there I see many WARNING, all related with tombstones,
>> but there are no ERRORS.
>>
>>  My problem is that I see some *many missing records* in the DB, and I
>> have never observed this with previous versions.
>>
>>  1) Is this a know problem?
>> 2) Do you have any idea how I could track down this problem?
>> 3) What is the meaning of this WARNING (the only type of ERROR | WARN  I
>> could find)?
>>
>>  WARN  [SharedPool-Worker-2] 2015-06-15 10:12:00,866
>> SliceQueryFilter.java:319 - Read 2990 live and 16016 tombstone cells in
>> gttdata.alltrades_co_rep_pcode for key: D:07 (see
>> tombstone_warn_threshold). 5000 columns were requested,
>> slices=[388:201001-388:201412:!]
>>
>>
>>  4) Is it possible to have Tombstone when we make no DELETE statements?
>>
>>  I’m lost…
>>
>>  Thanks for your help.
>>
>
>
> --
>
>
>
>
>
>

Re: Missing data

Reply via email to