Re:Re: Re: Why do we need an empty column when doing upsert?

William Tue, 22 Dec 2015 00:40:09 -0800

Hi Gabriel,


Thanks a lot for your reply. 
I was confused. I misunderstood 'not null' constraint as 'required at each 
upsert'. 


-William


At 2015-12-22 15:25:40, "Gabriel Reid" <[email protected]> wrote:
>Hi William,
>
>Yes, all your tests there look correct, and it looks like you've got a
>good understanding on the underlying principles of how scans are
>working in Phoenix vs HBase.
>
>About the use of "not null" constraints, you're correct about data
>being written by non-Phoenix clients being an issue. However, the
>bigger issue is doing upsert statements in general. Phoenix doesn't
>know if an UPSERT statement is updating an existing row, or adding a
>new row. If an update of an existing row is being done, then it would
>be possible to only upsert a subset of the total columns for the row,
>but an upsert of a new row would have to include all columns.
>
>It would be possible to determine if each upsert was a new row by
>reading each row before writing it, but this would be very costly in
>terms of performance.
>
>- Gabriel
>
>On Tue, Dec 22, 2015 at 3:22 AM, William <[email protected]> wrote:
>> Hi Gabriel:
>>
>>
>> Now I think I understand why an empty column is necessary. I did the 
>> following tests:
>>
>>
>> 1. by HBASE native API: after I deleted the last attribute column of a 
>> specified row, the whole row was deleted.
>>     and this is exactly what you said 'the row key is not stored at all 
>> unless there is at least one column stored.'
>>
>>
>> 2. by Phoenix client:
>>     * create table tt ( pk varchar not null primary key, cf.b varchar, cf.c 
>> varchar);
>>     * upsert into tt values ( 'this is pk');
>>       now, do "scan 'tt' " over hbase shell, I got a row with an empty 
>> column '_0', and this row would not have been existed if phoenix did not 
>> insert this empty column and of course hbase native API will not allow u to 
>> do this.
>>
>> 3. by phoenix client:
>>     even if we do insert not null non pk column in an upsert, an empty 
>> column is ALWAYS necessary because columns can be deleted.
>>    *  upsert into tt values ( 'this is pk', 'bbbbb', 'ccccc');
>>    *  upsert into tt values ( 'this is pk', null, null);
>>    * or alter table tt drop column cf.b, cf.c
>>       if we do not insert the empty column at 1st upsert, then after 2nd 
>> upsert, the whole row will disappear. But for SQL, we expect a row only with 
>> the PK value.
>>
>>
>> 4. why we cannot add not null constraint on non pk column ?
>>     I tried this : create table tt (pk varchar not null primary key, cf.b 
>> varchar not null);
>>     and it raised an exception : Invalid not null constraint on non primary 
>> key column columnName=TT.B
>>     I figured out one reason : the data of a phoenix table might be written 
>> by hbase native api, which does not support not null constraint on non pk 
>> column.
>>    Is it right? Is it the primary reason?
>>
>>
>> thanks
>> - William
>>
>> At 2015-12-21 22:46:30, "Gabriel Reid" <[email protected]> wrote:
>>>Hi William,
>>>
>>>The empty column is needed to ensure that a given column is available
>>>for all rows.
>>>
>>>As you may know, data is stored in HBase as KeyValues, meaning that
>>>the full row key is stored for each column value. This also implies
>>>that the row key is not stored at all unless there is at least one
>>>column stored.
>>>
>>>Now consider JDBC row which has an integer primary key, and several
>>>columns which are all null. In order to be able to store the primary
>>>key, a KeyValue needs to be stored to show that the row is present at
>>>all. This column is represented by the empty column that you've
>>>noticed. This allows doing a "SELECT * FROM TABLE" and receiving
>>>records for all rows, even those whose non-pk columns are null.
>>>
>>>The same issue comes up even if only one column is null for some (or
>>>all) records. A scan over Phoenix will include the empty column to
>>>ensure that rows that only consist of the primary key (and have null
>>>for all non-key columns) will be included in a scan result.
>>>
>>>- Gabriel
>>>
>>>On Mon, Dec 21, 2015 at 2:58 PM, 杨晗 <[email protected]> wrote:
>>>> hi all:
>>>>      I'm reading phoenix src code recently, and i found 
>>>> PRowImpl.toRowMutations() always adds an empty column named '_0' for 
>>>> non-delete upsert. Why?
>>>>      I read the comment but i didn't quite understand it. Might someone 
>>>> give me an example that illustates why an empty column is ALWAYS necessary?
>>>>
>>>>
>>>>     Further more, I have to access a phoenix table by both phoenix client 
>>>> and hbase API in some cases. If i do not add this empty column explicitly 
>>>> by hbase API, it is ok  if i read this row by phoenix?
>>>>
>>>>
>>>> Thanks
>>>> -William

Re:Re: Re: Why do we need an empty column when doing upsert?

Reply via email to