Re: question on NULL representation in DB?

Suresh Subbiah Fri, 30 Oct 2015 04:39:40 -0700

Hi,

Summarising personal correspondence with Anoop on this question
"  there are 2 ways in which traf puts and detect null value in hbase:
either a missing value or a column

value with null indicator prefix.

During insert, we use the first method of not inserting that column.

But during update, we use the second method of putting in null indicator as
the value of that column.

We create rowid with null values or part of key with null values by putting
in null indicator and zeroing

out remainder of the field. That way 2 null values will be compared equal
and null will sort high."

Are we thinking that if we implemented an update which sets a column value
to be NULL as a HBase Delete of that cell then predicate pushdown need not
check for null values again? It will be some work to split out an Update as
Put of NonNull values and then a Delete of null values. Will need an
expression to be evaluated at Update time. I suppose we have a choice, pay
the cost of expression eval during select or during update.

Thanks

Suresh

On Thu, Oct 29, 2015 at 1:34 PM, Selva Govindarajan <
[email protected]> wrote:

> By default, the null values are not inserted into hbase. If the column is
> nullable, the first additional byte determines if the column value is null
> or not. When a value is inserted into nullable column the first byte is
> always 0x00. When the column value is updated to null, the existing column
> value will be replaced with 0xFF in the first byte because it is not
> possible to switch to delete cell value in the midst of data execution.
>
> Selva
>
> -----Original Message-----
> From: Eric Owhadi [mailto:[email protected]]
> Sent: Thursday, October 29, 2015 11:06 AM
> To: [email protected]
> Subject: question on NULL representation in DB?
>
> Reading the code, I have a hard time understanding the various ways NULLs
> are represented in the DB for non-aligned format.
>
> I see comments in the code suggesting that nullable columns have the first
> value byte representing if the value is null, but I also see special cases
> all over the place that take care of null as being totally absent cells.
>
> The former method (adding a first byte indicating a null) having
> consequences on predicate push down -> need to re-do predicate evaluation
> at
> trafodion layer to deal with null semantic.
>
>
>
> But I am not sure why we have this special situation of coding null with a
> byte, instead of always dealing with nulls as being “absent” cell? I am
> sure
> there is a reason, but I just could not figure it out…
>
> Someone can help?
>
> Eric
>

Re: question on NULL representation in DB?

Reply via email to