RE: question on NULL representation in DB?

Selva Govindarajan Fri, 30 Oct 2015 10:02:56 -0700

Consider the following test case

>>select * from mytable2 ;


C1           C2
-----------  -----------

          1            1
          2            ?
          3            3

--- 3 row(s) selected.

>>prepare s1 from update mytable2 set c1 = c2 ;

--- SQL command prepared.
>>explain options 'f' s1 ;

LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
---- ---- ---- --------------------  --------  --------------------  ---------

1    .    2    root                  o         x
1.00E+002
.    .    1    trafodion_update                MYTABLE2
1.00E+002

--- SQL operation complete.
>>execute s1 ;

--- 3 row(s) updated.
>>select * from mytable2 ;
..

C1           C2
-----------  -----------

          1            1
          ?            ?
          3            3

--- 3 row(s) selected.

Ideally we would want the plan  below for better performance on larger
tables because vsbb update kicks in. Then we would want the data flow to
take different paths requiring a plan change. Hence I said in my earlier
message, it is not possible  to switch to delete in the midst of data flow.

>>control query shape  nested_join(cut, cut ) ;;
--- SQL operation complete.
>>prepare s1 from update mytable2 set c1 = c2 ;

--- SQL command prepared.
>>explain options 'f' s1 ;
..

LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
---- ---- ---- --------------------  --------  --------------------  ---------

3    .    4    root                            x
1.00E+002
1    2    3    tuple_flow
1.00E+002
.    .    2    trafodion_vsbb_updat            MYTABLE2
1.00E+000
.    .    1    trafodion_scan                  MYTABLE2
1.00E+002

--- SQL operation complete.
>>

Selva

-----Original Message-----
From: Suresh Subbiah [mailto:[email protected]]
Sent: Friday, October 30, 2015 4:39 AM
To: [email protected]
Subject: Re: question on NULL representation in DB?

Hi,

Summarising personal correspondence with Anoop on this question "  there are
2 ways in which traf puts and detect null value in hbase:
either a missing value or a column

value with null indicator prefix.



During insert, we use the first method of not inserting that column.

But during update, we use the second method of putting in null indicator as
the value of that column.



We create rowid with null values or part of key with null values by putting
in null indicator and zeroing

out remainder of the field. That way 2 null values will be compared equal
and null will sort high."


Are we thinking that if we implemented an update which sets a column value
to be NULL as a HBase Delete of that cell then predicate pushdown need not
check for null values again? It will be some work to split out an Update as
Put of NonNull values and then a Delete of null values. Will need an
expression to be evaluated at Update time. I suppose we have a choice, pay
the cost of expression eval during select or during update.


Thanks

Suresh

On Thu, Oct 29, 2015 at 1:34 PM, Selva Govindarajan <
[email protected]> wrote:

> By default, the null values are not inserted into hbase. If the column
> is nullable, the first additional byte determines if the column value
> is null or not. When a value is inserted into nullable column the
> first byte is always 0x00. When the column value is updated to null,
> the existing column value will be replaced with 0xFF in the first byte
> because it is not possible to switch to delete cell value in the midst of
> data execution.
>
> Selva
>
> -----Original Message-----
> From: Eric Owhadi [mailto:[email protected]]
> Sent: Thursday, October 29, 2015 11:06 AM
> To: [email protected]
> Subject: question on NULL representation in DB?
>
> Reading the code, I have a hard time understanding the various ways
> NULLs are represented in the DB for non-aligned format.
>
> I see comments in the code suggesting that nullable columns have the
> first value byte representing if the value is null, but I also see
> special cases all over the place that take care of null as being totally
> absent cells.
>
> The former method (adding a first byte indicating a null) having
> consequences on predicate push down -> need to re-do predicate
> evaluation at trafodion layer to deal with null semantic.
>
>
>
> But I am not sure why we have this special situation of coding null
> with a byte, instead of always dealing with nulls as being “absent”
> cell? I am sure there is a reason, but I just could not figure it out…
>
> Someone can help?
>
> Eric
>

RE: question on NULL representation in DB?

Reply via email to