Re: Reason for not allowing null values for in Column

2010-03-08 Thread Erik Holstad
On Mon, Mar 8, 2010 at 10:14 AM, Jonathan Ellis  wrote:

> On Mon, Mar 8, 2010 at 12:07 PM, Erik Holstad 
> wrote:
> > So why is it again that the value field in the Column cannot be null if
> it
> > is not the
> > value field in the map, but just a part of the value field?
>
> Because without a compelling reason to allow nulls, the best policy is
> not to do so.
>
> This for me is about memory usage, I guess, so I was just curious if there
was a good reason
for using more than needed and I guess best policy is a reason for that.

> All of this makes total sense, I'm wondering about use cases where you
> want
> > to
> > get an empty row when you don't know if it has been deleted or not.
>
> If you're saying, "I understand that doing X would be Really
> Inefficient, but I want you to do it anyway because of some use case
> that nobody actually needs so far," then I think you have your answer.
>
> If that is not what you are asking then you'll need to give me a
> concrete example because I don't understand the question.
>
> Well, I cannot say that I understand all of this, since I'm not getting it
:)
But for me when you do a range query you want to know what data that you
have to work
with in those rows and usually not too interested in the empty ones. And the
reason for not
returning empty ones would be to save IO.


> -Jonathan
>



-- 
Regards Erik


Re: Reason for not allowing null values for in Column

2010-03-08 Thread Jonathan Ellis
On Mon, Mar 8, 2010 at 12:07 PM, Erik Holstad  wrote:
> So why is it again that the value field in the Column cannot be null if it
> is not the
> value field in the map, but just a part of the value field?

Because without a compelling reason to allow nulls, the best policy is
not to do so.

> All of this makes total sense, I'm wondering about use cases where you want
> to
> get an empty row when you don't know if it has been deleted or not.

If you're saying, "I understand that doing X would be Really
Inefficient, but I want you to do it anyway because of some use case
that nobody actually needs so far," then I think you have your answer.

If that is not what you are asking then you'll need to give me a
concrete example because I don't understand the question.

-Jonathan


Re: Reason for not allowing null values for in Column

2010-03-08 Thread Erik Holstad
On Mon, Mar 8, 2010 at 9:30 AM, Jonathan Ellis  wrote:

> On Mon, Mar 8, 2010 at 11:22 AM, Erik Holstad 
> wrote:
> > I was probably a little bit unclear here. I'm wondering about the two
> byte[]
> > in Column.
> > One for name and one for value. I was under the impression that the
> > skiplistmap
> > wraps the Columns, not that the name and the value are themselves
> inserted
> > into a map?
>
> The column name is the key in one such map, yes.
>
So why is it again that the value field in the Column cannot be null if it
is not the
value field in the map, but just a part of the value field?

>
> >> > is it really that expensive to check if the list is empty before
> >> > returning
> >> > that row
> >>
> >> Yes, because you have to check the entire row, which may be much
> >> larger than the given predicate.
> >
> > That makes sense, but why would you be interested in the rows present
> > outside
> > your specified predicate?
>
> Because get_range_slice says, "apply this predicate to the range of
> rows given," meaning, if the predicate result is empty, we have to
> include an empty result for that row key.  It is perfectly valid to
> perform such a query returning empty column lists for some or all
> keys, even if no deletions have been performed.  So to special case
> leaving out result entries for deletions, we have to check the entire
> rest of the row to make sure there is no undeleted data anywhere else
> either (in which case leaving the key out would be an error).
>
All of this makes total sense, I'm wondering about use cases where you want
to
get an empty row when you don't know if it has been deleted or not.


-- 
Regards Erik


Re: Reason for not allowing null values for in Column

2010-03-08 Thread Jonathan Ellis
On Mon, Mar 8, 2010 at 11:22 AM, Erik Holstad  wrote:
> I was probably a little bit unclear here. I'm wondering about the two byte[]
> in Column.
> One for name and one for value. I was under the impression that the
> skiplistmap
> wraps the Columns, not that the name and the value are themselves inserted
> into a map?

The column name is the key in one such map, yes.

>> > is it really that expensive to check if the list is empty before
>> > returning
>> > that row
>>
>> Yes, because you have to check the entire row, which may be much
>> larger than the given predicate.
>
> That makes sense, but why would you be interested in the rows present
> outside
> your specified predicate?

Because get_range_slice says, "apply this predicate to the range of
rows given," meaning, if the predicate result is empty, we have to
include an empty result for that row key.  It is perfectly valid to
perform such a query returning empty column lists for some or all
keys, even if no deletions have been performed.  So to special case
leaving out result entries for deletions, we have to check the entire
rest of the row to make sure there is no undeleted data anywhere else
either (in which case leaving the key out would be an error).


Re: Reason for not allowing null values for in Column

2010-03-08 Thread Erik Holstad
On Mon, Mar 8, 2010 at 9:10 AM, Jonathan Ellis  wrote:

> On Mon, Mar 8, 2010 at 11:07 AM, Erik Holstad 
> wrote:
> > Why is it that null column values are not allowed?
>
> It's semantically unnecessary and potentially harmful at an
> implementation level.  (Many java Map implementations can't
> distinguish between a null key and a key that is not present.)
>
I was probably a little bit unclear here. I'm wondering about the two byte[]
in Column.
One for name and one for value. I was under the impression that the
skiplistmap
wraps the Columns, not that the name and the value are themselves inserted
into a map?


>
> > What is the reason for using a ConcurrentSkipListMap for
> > columns_ in ColumnFamily
> > compared to using the set version and use the comparator to sort on the
> name
> > field in IColumn?
>
> ?
>
> > For the call get_range_slice() you get all the rows returned even though
> > they might have been deleted,
>
> Yes, that is the point.
>
> > is it really that expensive to check if the list is empty before
> returning
> > that row
>
> Yes, because you have to check the entire row, which may be much
> larger than the given predicate.
>
That makes sense, but why would you be interested in the rows present
outside
your specified predicate?

>
> -Jonathan
>



-- 
Regards Erik


Re: Reason for not allowing null values for in Column

2010-03-08 Thread Jonathan Ellis
On Mon, Mar 8, 2010 at 11:07 AM, Erik Holstad  wrote:
> Why is it that null column values are not allowed?

It's semantically unnecessary and potentially harmful at an
implementation level.  (Many java Map implementations can't
distinguish between a null key and a key that is not present.)

> What is the reason for using a ConcurrentSkipListMap for
> columns_ in ColumnFamily
> compared to using the set version and use the comparator to sort on the name
> field in IColumn?

?

> For the call get_range_slice() you get all the rows returned even though
> they might have been deleted,

Yes, that is the point.

> is it really that expensive to check if the list is empty before returning
> that row

Yes, because you have to check the entire row, which may be much
larger than the given predicate.

-Jonathan