On Mon, 1 Feb 2010 11:14:12 -0600 Jonathan Ellis <jbel...@gmail.com> wrote: 

JE> 2010/2/1 Ted Zlatanov <t...@lifelogs.com>:
>> On Mon, 1 Feb 2010 10:41:28 -0600 Jonathan Ellis <jbel...@gmail.com> wrote:
>> 
JE> I don't think this is very useful for column names.  I could see it
JE> being useful for values but if we're going to add predicate queries
JE> then I'd rather do something more general.
>> 
>> Do you have any ideas?

JE> Not really, no.  I think we're best served developing feature X by
JE> starting with problems that can only be solved with X and working from
JE> there.  Going the other direction is asking for trouble.

I looked at the filters, e.g. o.a.c.db.filter.SliceQueryFilter, and it
seems like one place to put predicate logic is in that hierarchy.
Perhaps there can be a PredicateQueryFilter.  Some thought has
apparently already gone into flexible filters at the storage level.  I
hope something happens in this direction but I won't push for it
further since it's not what I need.

The attached patch is how I propose to do bitmasks inside the
SlicePredicate.  As you suggested, it solves the specific problem.  It's
pretty simple and carries no performance penalty if bitmasks are not
used.  It's untested, intended to show the interface and approach I am
proposing.  I didn't open an issue since it's unclear that this is the
way to go.

Thanks
Ted

Index: cassandra-trunk/interface/cassandra.thrift
===================================================================
--- cassandra-trunk.orig/interface/cassandra.thrift	2010-02-03 10:29:39.000000000 -0600
+++ cassandra-trunk/interface/cassandra.thrift	2010-02-03 10:30:01.000000000 -0600
@@ -217,16 +217,19 @@
     which is described as "a property that the elements of a set have in common."
 
     SlicePredicate's in Cassandra are described with either a list of column_names or a SliceRange.  If column_names is
-    specified, slice_range is ignored.
+    specified, slice_range is ignored.  The optional bitmasks parameter applies in either case.
 
     @param column_name. A list of column names to retrieve. This can be used similar to Memcached's "multi-get" feature
                         to fetch N known column names. For instance, if you know you wish to fetch columns 'Joe', 'Jack',
                         and 'Jim' you can pass those column names as a list to fetch all three at once.
     @param slice_range. A SliceRange describing how to range, order, and/or limit the slice.
+
+    @param bitmasks. A list of OR-ed binary AND masks applied to the result set AFTER the column_names and BEFORE the slice_range.
  */
 struct SlicePredicate {
     1: optional list<binary> column_names,
     2: optional SliceRange   slice_range,
+    3: optional list<binary> bitmasks,
 }
 
 /**
Index: cassandra-trunk/src/java/org/apache/cassandra/db/RangeSliceCommand.java
===================================================================
--- cassandra-trunk.orig/src/java/org/apache/cassandra/db/RangeSliceCommand.java	2010-02-03 10:28:33.000000000 -0600
+++ cassandra-trunk/src/java/org/apache/cassandra/db/RangeSliceCommand.java	2010-02-03 10:30:19.000000000 -0600
@@ -103,6 +103,11 @@
                            StorageService.Verb.RANGE_SLICE,
                            Arrays.copyOf(dob.getData(), dob.getLength()));
     }
+    
+    public boolean isPredicateBitmasked()
+    {
+        return null != predicate && predicate.getBitmasksSize() > 0;
+    }
 
     public static RangeSliceCommand read(Message message) throws IOException
     {
Index: cassandra-trunk/src/java/org/apache/cassandra/service/StorageProxy.java
===================================================================
--- cassandra-trunk.orig/src/java/org/apache/cassandra/service/StorageProxy.java	2010-02-03 10:29:14.000000000 -0600
+++ cassandra-trunk/src/java/org/apache/cassandra/service/StorageProxy.java	2010-02-03 10:58:32.000000000 -0600
@@ -32,6 +32,7 @@
 
 import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.db.*;
+
 import java.net.InetAddress;
 import org.apache.cassandra.net.IAsyncResult;
 import org.apache.cassandra.net.Message;
@@ -581,7 +582,25 @@
             // if we're done, great, otherwise, move to the next range
             try
             {
-                rows.putAll(handler.get());
+                Map<String, ColumnFamily> get = handler.get();
+                if (command.isPredicateBitmasked())
+                {
+                    List<byte[]> bitmasks = command.predicate.getBitmasks();
+                    for (ColumnFamily cf: get.values())
+                    {
+                        Set<byte[]> cols = cf.getColumnNames();
+                        for (byte[] column: cols)
+                        {
+                            if (!matchesBitmasks(column, bitmasks))
+                            {
+                                // this column has failed the bitmask filter
+                                cf.remove(column);
+                            }
+                        }
+                    }
+                }
+                
+                rows.putAll(get);
             }
             catch (DigestMismatchException e)
             {
@@ -610,6 +629,28 @@
         rangeStats.add(System.currentTimeMillis() - startTime);
         return results;
     }
+    
+    static boolean matchesBitmasks(byte[] value, List<byte[]> bitmasks)
+    {
+        BITMASK: for (byte[] bitmask: bitmasks)
+        {
+            boolean fail = false;
+            int limit = Math.min(bitmask.length, value.length);
+            for (int offset=0; offset < limit; offset++)
+            {
+                if (0 == (bitmask[offset] & value[offset]))
+                {
+                    continue BITMASK;
+                }
+            }
+            
+            // this will only return true if a bitmask has successfully matched the value
+            return true;
+        }
+    
+        // none of the bitmasks were successful
+        return false;    
+    }
 
     public long getReadOperations()
     {

Reply via email to