Ok... 

If you want to do type checking and schema enforcement... 

You will need to do this as a coprocessor. 

The quick and dirty way... (Not recommended) would be to hard code the schema 
in to the co-processor code.) 

A better way... at start up, load up ZK to manage the set of known table 
schemas which would be a map of column qualifier to data type. 
(If JSON then you need to do a separate lookup to get the records schema)

Then a single java class that does the look up and then handles the known data 
type comparators. 

Does this make sense? 
(Sorry, kinda was thinking this out as I typed the response. But it should work 
) 

At least it would be a design approach I would talk. YMMV

Having said that, I expect someone to say its a bad idea and that they have a 
better solution. 

HTH

-Mike

On Jun 27, 2013, at 5:13 PM, Kristoffer Sjögren <sto...@gmail.com> wrote:

> I see your point. Everything is just bytes.
> 
> However, the schema is known and every row is formatted according to this
> schema, although some columns may not exist, that is, no value exist for
> this property on this row.
> 
> So if im able to apply these "typed comparators" to the right cell values
> it may be possible? But I cant find a filter that target specific columns?
> 
> Seems like all filters scan every column/qualifier and there is no way of
> knowing what column is currently being evaluated?
> 
> 
> On Thu, Jun 27, 2013 at 11:51 PM, Michael Segel
> <michael_se...@hotmail.com>wrote:
> 
>> You have to remember that HBase doesn't enforce any sort of typing.
>> That's why this can be difficult.
>> 
>> You'd have to write a coprocessor to enforce a schema on a table.
>> Even then YMMV if you're writing JSON structures to a column because while
>> the contents of the structures could be the same, the actual strings could
>> differ.
>> 
>> HTH
>> 
>> -Mike
>> 
>> On Jun 27, 2013, at 4:41 PM, Kristoffer Sjögren <sto...@gmail.com> wrote:
>> 
>>> I realize standard comparators cannot solve this.
>>> 
>>> However I do know the type of each column so writing custom list
>>> comparators for boolean, char, byte, short, int, long, float, double
>> seems
>>> quite straightforward.
>>> 
>>> Long arrays, for example, are stored as a byte array with 8 bytes per
>> item
>>> so a comparator might look like this.
>>> 
>>> public class LongsComparator extends WritableByteArrayComparable {
>>>   public int compareTo(byte[] value, int offset, int length) {
>>>       long[] values = BytesUtils.toLongs(value, offset, length);
>>>       for (long longValue : values) {
>>>           if (longValue == val) {
>>>               return 0;
>>>           }
>>>       }
>>>       return 1;
>>>   }
>>> }
>>> 
>>> public static long[] toLongs(byte[] value, int offset, int length) {
>>>   int num = (length - offset) / 8;
>>>   long[] values = new long[num];
>>>   for (int i = offset; i < num; i++) {
>>>       values[i] = getLong(value, i * 8);
>>>   }
>>>   return values;
>>> }
>>> 
>>> 
>>> Strings are similar but would require charset and length for each string.
>>> 
>>> public class StringsComparator extends WritableByteArrayComparable  {
>>>   public int compareTo(byte[] value, int offset, int length) {
>>>       String[] values = BytesUtils.toStrings(value, offset, length);
>>>       for (String stringValue : values) {
>>>           if (val.equals(stringValue)) {
>>>               return 0;
>>>           }
>>>       }
>>>       return 1;
>>>   }
>>> }
>>> 
>>> public static String[] toStrings(byte[] value, int offset, int length) {
>>>   ArrayList<String> values = new ArrayList<String>();
>>>   int idx = 0;
>>>   ByteBuffer buffer = ByteBuffer.wrap(value, offset, length);
>>>   while (idx < length) {
>>>       int size = buffer.getInt();
>>>       byte[] bytes = new byte[size];
>>>       buffer.get(bytes);
>>>       values.add(new String(bytes));
>>>       idx += 4 + size;
>>>   }
>>>   return values.toArray(new String[values.size()]);
>>> }
>>> 
>>> 
>>> Am I on the right track or maybe overlooking some implementation details?
>>> Not really sure how to target each comparator to a specific column value?
>>> 
>>> 
>>> On Thu, Jun 27, 2013 at 9:21 PM, Michael Segel <
>> michael_se...@hotmail.com>wrote:
>>> 
>>>> Not an easy task.
>>>> 
>>>> You first need to determine how you want to store the data within a
>> column
>>>> and/or apply a type constraint to a column.
>>>> 
>>>> Even if you use JSON records to store your data within a column, does an
>>>> equality comparator exist? If not, you would have to write one.
>>>> (I kinda think that one may already exist...)
>>>> 
>>>> 
>>>> On Jun 27, 2013, at 12:59 PM, Kristoffer Sjögren <sto...@gmail.com>
>> wrote:
>>>> 
>>>>> Hi
>>>>> 
>>>>> Working with the standard filtering mechanism to scan rows that have
>>>>> columns matching certain criterias.
>>>>> 
>>>>> There are columns of numeric (integer and decimal) and string types.
>>>> These
>>>>> columns are single or multi-valued like "1", "2", "1,2,3", "a", "b" or
>>>>> "a,b,c" - not sure what the separator would be in the case of list
>> types.
>>>>> Maybe none?
>>>>> 
>>>>> I would like to compose the following queries to filter out rows that
>>>> does
>>>>> not match.
>>>>> 
>>>>> - contains(String column, String value)
>>>>> Single valued column that String.contain() provided value.
>>>>> 
>>>>> - equal(String column, Object value)
>>>>> Single valued column that Object.equals() provided value.
>>>>> Value is either string or numeric type.
>>>>> 
>>>>> - greaterThan(String column, java.lang.Number value)
>>>>> Single valued column that > provided numeric value.
>>>>> 
>>>>> - in(String column, Object value...)
>>>>> Multi-valued column have values that Object.equals() all provided
>>>> values.
>>>>> Values are of string or numeric type.
>>>>> 
>>>>> How would I design a schema that can take advantage of the already
>>>> existing
>>>>> filters and comparators to accomplish this?
>>>>> 
>>>>> Already looked at the string and binary comparators but fail to see how
>>>> to
>>>>> solve this in a clean way for multi-valued column values.
>>>>> 
>>>>> Im aware of custom filters but would like to avoid it if possible.
>>>>> 
>>>>> Cheers,
>>>>> -Kristoffer
>>>> 
>>>> 
>> 
>> 

Reply via email to