Thanks for the info James. I've created PHOENIX-1108 to look into this further.

I've also just done a bit more experimentation to understand the
implications of KEEP_DELETED_CELLS a bit better, and have noticed that
this is primarily an issue for me at the moment because I'm setting
the VERSIONS attribute to Integer.MAX_VALUE. Combined with
KEEP_DELETED_CELLS=true, this basically means that I can never
actually fully delete data in the current situation.

- Gabriel


On Wed, Jul 23, 2014 at 8:42 PM, James Taylor <jamestay...@apache.org> wrote:
> Good question, Gabriel. I believe that the deleted cells are cleaned
> up after a second major compaction with the KEEP_DELETED_CELLS option
> enabled. Lars H. implemented this option, so he can comment more, but
> AFAIK he couldn't figure out how to get them to be collected on the
> first major compaction. IMHO, this seems like a bug (but what do I
> know, I'm not an HBase committer :-) ).
>
> The time that KEEP_DELETED_CELLS is required is for flashback or
> point-in-time queries. IMHO, without this option, HBase doesn't really
> work correctly. Though you might argue "we never do that" and turn it
> off, under-the-covers, Phoenix is doing point-in-time queries. If you
> have a query that starts, at t1 and runs until t5, it won't see data
> inserted after t1. Say a delete was done on a row at t2. Without the
> KEEP_DELETED_CELLS being true, you'd potentially see this delete from
> your query.
>
> Perhaps the MVCC used by HBase should (does?) take care of this
> automatically without us setting a max on the scan time range, but I'm
> not sure. If it does, then we could likely not have this be the
> default. We'd need to test this with the new ChunkedResultIterator as
> well.
>
> Maybe file a JIRA for further investigation?
>
> Thanks,
> James
>
> On Wed, Jul 23, 2014 at 7:09 AM, Gabriel Reid <gabriel.r...@gmail.com> wrote:
>> Hi,
>>
>> I noticed that HColumnDescriptor.KEEP_DELETED_CELLS is enabled by
>> default on new Phoenix tables. This seems like a bit of an unexpected
>> default, as it means (at least as far as I understand it) that data
>> deleted with delete statements will never actually be cleared, even
>> after a major compaction.
>>
>> Can anyone let me know what the reasoning is behind this? Any
>> functional requirement within Phoenix that makes use of this default
>> property (i.e. if I disable it in my DDL, is there anything that we
>> know won't work then)? And then going further, is this something we
>> definitely want to keep as a default?
>>
>> Thanks,
>>
>> Gabriel

Reply via email to