Re: Returning an UDT from a user defined function (UDF)

2016-04-07 Thread Henry M
Whatever I wanted to do does not seem to be possible (probably a limitation
or a bug)... I see a way to get the KeyspaceMetadata and from that get the
UserType instance (code lines 1 & 2 below).

1.)

org.apache.cassandra.schema.KeyspaceMetadata ksm =
org.apache.cassandra.config.Schema.instance.getKSMetaData("test_ks");


2.)

com.datastax.driver.core.UserType myUdt = ksm.types.get("my_other_udt").get();


But this fails with the below error because Schema is not a whitelisted
package. It probably should not be whitelisted but there should be a way to
create and return a user defined type.

:88:InvalidRequest: code=2200 [Invalid query] message="Could not
compile function 'test_ks.transform_udt' from Java source:
org.apache.cassandra.exceptions.InvalidRequestException: Java source
compilation failed:
Line 4: org.apache.cassandra.schema.KeyspaceMetadata cannot be resolved to
a type
Line 4: org.apache.cassandra.config.Schema.instance cannot be resolved to a
type
"
:90:InvalidRequest: code=2200 [Invalid query] message="Unknown
function 'extract_text_field_sample_udt'"

My updated UDF for complete context.

CREATE OR REPLACE FUNCTION test_ks.transform_udt (val my_udt)
 RETURNS NULL ON NULL INPUT
 RETURNS my_other_udt
 LANGUAGE java
  AS '
String fieldA = val.getString("field_a");

org.apache.cassandra.schema.KeyspaceMetadata ksm =
org.apache.cassandra.config.Schema.instance.getKSMetaData("test_ks");

com.datastax.driver.core.UserType myUdt =
ksm.types.get("my_other_udt").get();

com.datastax.driver.core.UDTValue transformedValue = myUdt.newValue();

transformedValue.setUUID("id", java.util.UUID.randomUUID());
transformedValue.setString("field_a", fieldA);
transformedValue.setString("field_b", "value b");

return transformedValue;
  ';


On Thu, Apr 7, 2016 at 7:40 PM Henry M  wrote:

> I was wondering if it is possible to create an UDT and return it within a
> user defined function.
>
> I looked at this documentation
> http://docs.datastax.com/en/cql/3.3/cql/cql_using/useCreateUDF.html but
> the examples are only for basic types.
>
> This is my pseudo code I came up with... the part I think I am missing is
> how to get an instance of the UserType so that I can invoke newValue to
> create a UDTValue.
>
> Has anyone done this and know how to get the keyspace in order to call
> getUserType? Or know of an alternate approach?
>
> CREATE OR REPLACE FUNCTION test_ks.transform_udt (val my_udt)
>  RETURNS NULL ON NULL INPUT
>  RETURNS my_other_udt
>  LANGUAGE java
>   AS '
> String fieldA = val.getString("field_a");
>
> // How do you get a reference the user type?
> UserType myUdt = ?keyspace?.getUserType("my_other_udt");
>
> UDTValue transformedValue = myUdt.newValue();
>
> transformedValue.setUUID("id", UUID.randomUUID());
> transformedValue.setString("field_a", fieldA);
> transformedValue.setString("field_b", "value b");
>
> return transformedValue;
>   ';
>
>
> Thank you,
> Henry
>
>
> P.S. This is the setup for my sample table and types.
>
> drop keyspace test_ks;
>
> create keyspace test_ks WITH REPLICATION = { 'class' : 'SimpleStrategy', 
> 'replication_factor' : 1 };
>
> use test_ks;
>
> CREATE TYPE IF NOT EXISTS test_ks.my_udt (field_a text, field_b text);
> CREATE TYPE IF NOT EXISTS test_ks.my_other_udt (id uuid, field_a text, 
> field_b text);
>
> CREATE TABLE IF NOT EXISTS test_ks.sample_table(id uuid primary key, col_a 
> frozen);
>
> INSERT INTO sample_table(id, col_a) VALUES ( now() , { field_a: 'value 1', 
> field_b: 'value 2'} );
> INSERT INTO sample_table(id) VALUES ( now() );
>
>
>
>
>
>


Re: Mapping a continuous range to a discrete value

2016-04-07 Thread Henry M
I had to do something similar (in my case it was an IN  query)... I ended
up writing hack in java to create a custom Expression and injecting into
the RowFilter of a dummy secondary index (not advisable and very short term
but it keeps my application code clean). I am keeping my eyes open for the
evolution of SASI indexes (starting with cassandra 3.4
https://github.com/apache/cassandra/blob/trunk/doc/SASI.md) which should do
what you are looking.



On Thu, Apr 7, 2016 at 11:06 AM Mitch Gitman  wrote:

> I just happened to run into a similar situation myself and I can see it's
> through a bad schema design (and query design) on my part. What I wanted to
> do was narrow down by the range on one clustering column and then by
> another range on the next clustering column. Failing to adequately think
> through how Cassandra stores its sorted rows on disk, I just figured, hey,
> why not?
>
> The result? The same error message you got. But then, going back over some
> old notes from a DataStax CQL webinar, I came across this (my words):
>
> "You can do selects with combinations of the different primary keys
> including ranges on individual columns. The range will only work if you've
> narrowed things down already by equality on all the prior columns.
> Cassandra creates a composite type to store the column name."
>
> My new solution in response. Create two tables: one that's sorted by (in
> my situation) a high timestamp, the other that's sorted by (in my
> situation) a low timestamp. What had been two clustering columns gets
> broken up into one clustering column each in two different tables. Then I
> do two queries, one with the one range, the other with the other, and I
> programmatically merge the results.
>
> The funny thing is, that was my original design which my most recent, and
> failed, design is replacing. My new solution goes back to my old solution.
>
> On Thu, Apr 7, 2016 at 1:37 AM, Peer, Oded  wrote:
>
>> I have a table mapping continuous ranges to discrete values.
>>
>>
>>
>> CREATE TABLE range_mapping (k int, lower int, upper int, mapped_value
>> int, PRIMARY KEY (k, lower, upper));
>>
>> INSERT INTO range_mapping (k, lower, upper, mapped_value) VALUES (0, 0,
>> 99, 0);
>>
>> INSERT INTO range_mapping (k, lower, upper, mapped_value) VALUES (0, 100,
>> 199, 100);
>>
>> INSERT INTO range_mapping (k, lower, upper, mapped_value) VALUES (0, 200,
>> 299, 200);
>>
>>
>>
>> I then want to query this table to find mapping of a specific value.
>>
>> In SQL I would use: *select mapped_value from range_mapping where k=0
>> and ? between lower and upper*
>>
>>
>>
>> If the variable is bound to the value 150 then the mapped_value returned
>> is 100.
>>
>>
>>
>> I can’t use the same type of query in CQL.
>>
>> Using the query “*select * from range_mapping where k = 0 and lower <=
>> 150 and upper >= 150;*” returns an error "Clustering column "upper"
>> cannot be restricted (preceding column "lower" is restricted by a non-EQ
>> relation)"
>>
>>
>>
>> I thought of using multi-column restrictions but they don’t work as I
>> expected as the following query returns two rows instead of the one I
>> expected:
>>
>>
>>
>> *select * from range_mapping where k = 0 and (lower,upper) <= (150,999)
>> and (lower,upper) >= (-999,150);*
>>
>>
>>
>> k | lower | upper | mapped_value
>>
>> ---+---+---+--
>>
>> 0 | 0 |99 |0
>>
>> 0 |   100 |   199 |  100
>>
>>
>>
>> I’d appreciate any thoughts on the subject.
>>
>>
>>
>
>


Returning an UDT from a user defined function (UDF)

2016-04-07 Thread Henry M
I was wondering if it is possible to create an UDT and return it within a
user defined function.

I looked at this documentation
http://docs.datastax.com/en/cql/3.3/cql/cql_using/useCreateUDF.html but the
examples are only for basic types.

This is my pseudo code I came up with... the part I think I am missing is
how to get an instance of the UserType so that I can invoke newValue to
create a UDTValue.

Has anyone done this and know how to get the keyspace in order to call
getUserType? Or know of an alternate approach?

CREATE OR REPLACE FUNCTION test_ks.transform_udt (val my_udt)
 RETURNS NULL ON NULL INPUT
 RETURNS my_other_udt
 LANGUAGE java
  AS '
String fieldA = val.getString("field_a");

// How do you get a reference the user type?
UserType myUdt = ?keyspace?.getUserType("my_other_udt");

UDTValue transformedValue = myUdt.newValue();

transformedValue.setUUID("id", UUID.randomUUID());
transformedValue.setString("field_a", fieldA);
transformedValue.setString("field_b", "value b");

return transformedValue;
  ';


Thank you,
Henry


P.S. This is the setup for my sample table and types.

drop keyspace test_ks;

create keyspace test_ks WITH REPLICATION = { 'class' :
'SimpleStrategy', 'replication_factor' : 1 };

use test_ks;

CREATE TYPE IF NOT EXISTS test_ks.my_udt (field_a text, field_b text);
CREATE TYPE IF NOT EXISTS test_ks.my_other_udt (id uuid, field_a text,
field_b text);

CREATE TABLE IF NOT EXISTS test_ks.sample_table(id uuid primary key,
col_a frozen);

INSERT INTO sample_table(id, col_a) VALUES ( now() , { field_a: 'value
1', field_b: 'value 2'} );
INSERT INTO sample_table(id) VALUES ( now() );


Re: Mapping a continuous range to a discrete value

2016-04-07 Thread Mitch Gitman
I just happened to run into a similar situation myself and I can see it's
through a bad schema design (and query design) on my part. What I wanted to
do was narrow down by the range on one clustering column and then by
another range on the next clustering column. Failing to adequately think
through how Cassandra stores its sorted rows on disk, I just figured, hey,
why not?

The result? The same error message you got. But then, going back over some
old notes from a DataStax CQL webinar, I came across this (my words):

"You can do selects with combinations of the different primary keys
including ranges on individual columns. The range will only work if you've
narrowed things down already by equality on all the prior columns.
Cassandra creates a composite type to store the column name."

My new solution in response. Create two tables: one that's sorted by (in my
situation) a high timestamp, the other that's sorted by (in my situation) a
low timestamp. What had been two clustering columns gets broken up into one
clustering column each in two different tables. Then I do two queries, one
with the one range, the other with the other, and I programmatically merge
the results.

The funny thing is, that was my original design which my most recent, and
failed, design is replacing. My new solution goes back to my old solution.

On Thu, Apr 7, 2016 at 1:37 AM, Peer, Oded  wrote:

> I have a table mapping continuous ranges to discrete values.
>
>
>
> CREATE TABLE range_mapping (k int, lower int, upper int, mapped_value int,
> PRIMARY KEY (k, lower, upper));
>
> INSERT INTO range_mapping (k, lower, upper, mapped_value) VALUES (0, 0,
> 99, 0);
>
> INSERT INTO range_mapping (k, lower, upper, mapped_value) VALUES (0, 100,
> 199, 100);
>
> INSERT INTO range_mapping (k, lower, upper, mapped_value) VALUES (0, 200,
> 299, 200);
>
>
>
> I then want to query this table to find mapping of a specific value.
>
> In SQL I would use: *select mapped_value from range_mapping where k=0 and
> ? between lower and upper*
>
>
>
> If the variable is bound to the value 150 then the mapped_value returned
> is 100.
>
>
>
> I can’t use the same type of query in CQL.
>
> Using the query “*select * from range_mapping where k = 0 and lower <=
> 150 and upper >= 150;*” returns an error "Clustering column "upper"
> cannot be restricted (preceding column "lower" is restricted by a non-EQ
> relation)"
>
>
>
> I thought of using multi-column restrictions but they don’t work as I
> expected as the following query returns two rows instead of the one I
> expected:
>
>
>
> *select * from range_mapping where k = 0 and (lower,upper) <= (150,999)
> and (lower,upper) >= (-999,150);*
>
>
>
> k | lower | upper | mapped_value
>
> ---+---+---+--
>
> 0 | 0 |99 |0
>
> 0 |   100 |   199 |  100
>
>
>
> I’d appreciate any thoughts on the subject.
>
>
>


Re: Efficiently filtering results directly in CS

2016-04-07 Thread Jonathan Haddad
What is CS?

On Thu, Apr 7, 2016 at 10:03 AM Kevin Burton  wrote:

> I have a paging model whereby we stream data from CS by fetching 'pages'
> thereby reading (sequentially) entire datasets.
>
> We're using the bucket approach where we write data for 5 minutes, then we
> can just fetch the bucket for that range.
>
> Our app now has TONS of data and we have a piece of middleware that
> filters it based on the client requests.
>
> So if they only want english they just get english and filter away about
> 60% of our data.
>
> but it doesn't support condition pushdown.  So ALL this data has to be
> sent from our CS boxes to our middleware and filtered there (wasting a lot
> of network IO).
>
> Is there away (including refactoring the code) that I could push this this
> into CS?  Maybe some way I could discovery the CS topology and put daemons
> on each of our CS boxes and fetch from CS directly (doing the filtering
> there).
>
> Thoughts?
>
> --
>
> We’re hiring if you know of any awesome Java Devops or Linux Operations
> Engineers!
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
>
>


Efficiently filtering results directly in CS

2016-04-07 Thread Kevin Burton
I have a paging model whereby we stream data from CS by fetching 'pages'
thereby reading (sequentially) entire datasets.

We're using the bucket approach where we write data for 5 minutes, then we
can just fetch the bucket for that range.

Our app now has TONS of data and we have a piece of middleware that filters
it based on the client requests.

So if they only want english they just get english and filter away about
60% of our data.

but it doesn't support condition pushdown.  So ALL this data has to be sent
from our CS boxes to our middleware and filtered there (wasting a lot of
network IO).

Is there away (including refactoring the code) that I could push this this
into CS?  Maybe some way I could discovery the CS topology and put daemons
on each of our CS boxes and fetch from CS directly (doing the filtering
there).

Thoughts?

-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile



Re: Removing a DC

2016-04-07 Thread Joel Knighton
This sounds most like https://issues.apache.org/jira/browse/CASSANDRA-10371.

Are you on a version that could be affected by this issue?

Best,
Joel

On Thu, Apr 7, 2016 at 11:51 AM, Anubhav Kale 
wrote:

> Hello,
>
>
>
> We removed a DC using instructions from
> https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_decomission_dc_t.html
>
>
>
> After all nodes were gone,
>
>
>
> 1.   System.peers don’t have an entry for the nodes that were
> removed. (confirmed via a cqlsh query with consistency all)
>
> 2.   Nodetool describecluster don’t show them
>
> 3.   Nodetool gossipinfo show them as “LEFT”.
>
>
>
> However, logs continue to spew below and restarting the node doesn’t get
> rid of this. I am thinking a rolling restart of all nodes might fix it, but
> I am curious as to where is this information still held ? I don’t think
> this is causing any badness to the cluster, but I would like to get rid of
> this if possible.
>
>
>
> INFO  [GossipStage:83] 2016-04-07 16:38:07,859  Gossiper.java:998 -
> InetAddress /10.1.200.14 is now DOWN
>
> INFO  [GossipStage:83] 2016-04-07 16:38:07,861  StorageService.java:1914 -
> Removing tokens[*BLAH*] for /10.1.200.14
>
>
>
> Thanks !
>



-- 



Joel Knighton
Cassandra Developer | joel.knigh...@datastax.com


 

 


Removing a DC

2016-04-07 Thread Anubhav Kale
Hello,

We removed a DC using instructions from 
https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_decomission_dc_t.html

After all nodes were gone,


1.   System.peers don't have an entry for the nodes that were removed. 
(confirmed via a cqlsh query with consistency all)

2.   Nodetool describecluster don't show them

3.   Nodetool gossipinfo show them as "LEFT".

However, logs continue to spew below and restarting the node doesn't get rid of 
this. I am thinking a rolling restart of all nodes might fix it, but I am 
curious as to where is this information still held ? I don't think this is 
causing any badness to the cluster, but I would like to get rid of this if 
possible.

INFO  [GossipStage:83] 2016-04-07 16:38:07,859  Gossiper.java:998 - InetAddress 
/10.1.200.14 is now DOWN
INFO  [GossipStage:83] 2016-04-07 16:38:07,861  StorageService.java:1914 - 
Removing tokens[BLAH] for /10.1.200.14

Thanks !


Cassandra nodes using internal network to try and talk externally

2016-04-07 Thread Chris Elsmore
Hi,

I have a Cassandra 2.2.5 cluster with a datacenter DC03 with 5 nodes in a ring 
and I have DC04 with one node. 

Setup by default with all nodes talking on the external interfaces works well, 
no problems, all nodes in each DC can see and talk to each other.

I’m trying to follow the instructions here 
http://docs.datastax.com/en/cassandra/2.2/cassandra/configuration/configMultiNetworks.html
 for the node in DC04 in preparation of adding a new node.

When I follow the instructions to set the listen_address to the internal 
address, broadcast address to the external address and to set 
listen_on_broadcast to true, the nodes in DC03 can connect but do not handshake 
with the node in DC04. The output of ‘lsof -i -P | grep 7000’ shows that the 
node in DC04 is trying to connect to the IPs of the nodes in DC04 over the 
internal network, which obviously doesn’t work.

Any clues? I’m at a loss!


Chris



Re: Cassandra Single Node Setup Questions

2016-04-07 Thread Jack Krupansky
Not that we aren't enthusiastic about you moving to Cassandra, but it needs
to be for the right reasons, and for Cassandra the right reasons are
scaling and HA.

In case it's not obvious, I would make a really lousy used-car or
real-estate/time-share salesman!

-- Jack Krupansky

On Thu, Apr 7, 2016 at 10:13 AM, Eric Evans  wrote:

> On Wed, Apr 6, 2016 at 9:15 AM, Bhupendra Baraiya
>  wrote:
> >
> > The main reason we want to migrate to Cassandra is we have a
> denormalized data structure in Ms Sql server Database and we want to move
> to Open source database...
>
>
> If it all boils down to this, then you might want to consider MySQL or
> Postgres.
>
>
> --
> Eric Evans
> eev...@wikimedia.org
>


Re: Cassandra Single Node Setup Questions

2016-04-07 Thread Eric Evans
On Wed, Apr 6, 2016 at 9:15 AM, Bhupendra Baraiya
 wrote:
>
> The main reason we want to migrate to Cassandra is we have a denormalized 
> data structure in Ms Sql server Database and we want to move to Open source 
> database...


If it all boils down to this, then you might want to consider MySQL or Postgres.


-- 
Eric Evans
eev...@wikimedia.org


Re: seconday index queries with thrift in cassandra 3.x supported ?

2016-04-07 Thread Sam Tunnicliffe
That certainly looks like a bug, would you mind opening a ticket at
https://issues.apache.org/jira/browse/CASSANDRA please?

Thanks,
Sam

On Thu, Apr 7, 2016 at 2:19 PM, Ivan Georgiev  wrote:

> Hi, are secondary index queries with thrift supported in Cassandra 3.x ?
> Asking as I am not able to get them working.
>
> I am doing a get_range_slices call with row_filter set in the KeyRange
> property, but I am getting an exception in the server with the following
> trace:
>
>
>
> INFO   | jvm 1| 2016/04/07 14:56:35 | 14:56:35.403 [Thrift:16] DEBUG
> o.a.cassandra.service.ReadCallback - Failed; received 0 of 1 responses
>
> INFO   | jvm 1| 2016/04/07 14:56:35 | 14:56:35.404
> [SharedPool-Worker-1] WARN  o.a.c.c.AbstractLocalAwareExecutorService -
> Uncaught exception on thread Thread[SharedPool-Worker-1,5,main]: {}
>
> INFO   | jvm 1| 2016/04/07 14:56:35 | java.lang.RuntimeException:
> java.lang.NullPointerException
>
> INFO   | jvm 1| 2016/04/07 14:56:35 |   at
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2450)
> ~[apache-cassandra-3.0.4.jar:3.0.4]
>
> INFO   | jvm 1| 2016/04/07 14:56:35 |   at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[na:1.8.0_72]
>
> INFO   | jvm 1| 2016/04/07 14:56:35 |   at
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
> ~[apache-cassandra-3.0.4.jar:3.0.4]
>
> INFO   | jvm 1| 2016/04/07 14:56:35 |   at
> org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
> [apache-cassandra-3.0.4.jar:3.0.4]
>
> INFO   | jvm 1| 2016/04/07 14:56:35 |   at
> java.lang.Thread.run(Thread.java:745) [na:1.8.0_72]
>
> INFO   | jvm 1| 2016/04/07 14:56:35 | Caused by:
> java.lang.NullPointerException: null
>
> INFO   | jvm 1| 2016/04/07 14:56:35 |   at
> org.apache.cassandra.index.internal.keys.KeysSearcher.filterIfStale(KeysSearcher.java:155)
> ~[apache-cassandra-3.0.4.jar:3.0.4]
>
> INFO   | jvm 1| 2016/04/07 14:56:35 |   at
> org.apache.cassandra.index.internal.keys.KeysSearcher.access$300(KeysSearcher.java:36)
> ~[apache-cassandra-3.0.4.jar:3.0.4]
>
> INFO   | jvm 1| 2016/04/07 14:56:35 |   at
> org.apache.cassandra.index.internal.keys.KeysSearcher$1.prepareNext(KeysSearcher.java:104)
> ~[apache-cassandra-3.0.4.jar:3.0.4]
>
> INFO   | jvm 1| 2016/04/07 14:56:35 |   at
> org.apache.cassandra.index.internal.keys.KeysSearcher$1.hasNext(KeysSearcher.java:70)
> ~[apache-cassandra-3.0.4.jar:3.0.4]
>
> INFO   | jvm 1| 2016/04/07 14:56:35 |   at
> org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:72)
> ~[apache-cassandra-3.0.4.jar:3.0.4]
>
> INFO   | jvm 1| 2016/04/07 14:56:35 |   at
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:295)
> ~[apache-cassandra-3.0.4.jar:3.0.4]
>
> INFO   | jvm 1| 2016/04/07 14:56:35 |   at
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:134)
> ~[apache-cassandra-3.0.4.jar:3.0.4]
>
> INFO   | jvm 1| 2016/04/07 14:56:35 |   at
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:127)
> ~[apache-cassandra-3.0.4.jar:3.0.4]
>
> INFO   | jvm 1| 2016/04/07 14:56:35 |   at
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:123)
> ~[apache-cassandra-3.0.4.jar:3.0.4]
>
> INFO   | jvm 1| 2016/04/07 14:56:35 |   at
> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65)
> ~[apache-cassandra-3.0.4.jar:3.0.4]
>
> INFO   | jvm 1| 2016/04/07 14:56:35 |   at
> org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289)
> ~[apache-cassandra-3.0.4.jar:3.0.4]
>
> INFO   | jvm 1| 2016/04/07 14:56:35 |   at
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1792)
> ~[apache-cassandra-3.0.4.jar:3.0.4]
>
> INFO   | jvm 1| 2016/04/07 14:56:35 |   at
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2446)
> ~[apache-cassandra-3.0.4.jar:3.0.4]
>
> INFO   | jvm 1| 2016/04/07 14:56:35 |   ... 4 common
> frames omitted
>
>
>
> Are we still able to do thrift seconday index queries ? Using Cassandra
> 3.0.4. Same call works fine with Cassandra 2.2.5.
>
>
>
> Regards:
>
> Ivan
>


seconday index queries with thrift in cassandra 3.x supported ?

2016-04-07 Thread Ivan Georgiev
Hi, are secondary index queries with thrift supported in Cassandra 3.x ?
Asking as I am not able to get them working.

I am doing a get_range_slices call with row_filter set in the KeyRange
property, but I am getting an exception in the server with the following
trace:

 

INFO   | jvm 1| 2016/04/07 14:56:35 | 14:56:35.403 [Thrift:16] DEBUG
o.a.cassandra.service.ReadCallback - Failed; received 0 of 1 responses

INFO   | jvm 1| 2016/04/07 14:56:35 | 14:56:35.404 [SharedPool-Worker-1]
WARN  o.a.c.c.AbstractLocalAwareExecutorService - Uncaught exception on
thread Thread[SharedPool-Worker-1,5,main]: {}

INFO   | jvm 1| 2016/04/07 14:56:35 | java.lang.RuntimeException:
java.lang.NullPointerException

INFO   | jvm 1| 2016/04/07 14:56:35 |   at
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy
.java:2450) ~[apache-cassandra-3.0.4.jar:3.0.4]

INFO   | jvm 1| 2016/04/07 14:56:35 |   at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[na:1.8.0_72]

INFO   | jvm 1| 2016/04/07 14:56:35 |   at
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask
.run(AbstractLocalAwareExecutorService.java:164)
~[apache-cassandra-3.0.4.jar:3.0.4]

INFO   | jvm 1| 2016/04/07 14:56:35 |   at
org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
[apache-cassandra-3.0.4.jar:3.0.4]

INFO   | jvm 1| 2016/04/07 14:56:35 |   at
java.lang.Thread.run(Thread.java:745) [na:1.8.0_72]

INFO   | jvm 1| 2016/04/07 14:56:35 | Caused by:
java.lang.NullPointerException: null

INFO   | jvm 1| 2016/04/07 14:56:35 |   at
org.apache.cassandra.index.internal.keys.KeysSearcher.filterIfStale(KeysSear
cher.java:155) ~[apache-cassandra-3.0.4.jar:3.0.4]

INFO   | jvm 1| 2016/04/07 14:56:35 |   at
org.apache.cassandra.index.internal.keys.KeysSearcher.access$300(KeysSearche
r.java:36) ~[apache-cassandra-3.0.4.jar:3.0.4]

INFO   | jvm 1| 2016/04/07 14:56:35 |   at
org.apache.cassandra.index.internal.keys.KeysSearcher$1.prepareNext(KeysSear
cher.java:104) ~[apache-cassandra-3.0.4.jar:3.0.4]

INFO   | jvm 1| 2016/04/07 14:56:35 |   at
org.apache.cassandra.index.internal.keys.KeysSearcher$1.hasNext(KeysSearcher
.java:70) ~[apache-cassandra-3.0.4.jar:3.0.4]

INFO   | jvm 1| 2016/04/07 14:56:35 |   at
org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java
:72) ~[apache-cassandra-3.0.4.jar:3.0.4]

INFO   | jvm 1| 2016/04/07 14:56:35 |   at
org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.s
erialize(UnfilteredPartitionIterators.java:295)
~[apache-cassandra-3.0.4.jar:3.0.4]

INFO   | jvm 1| 2016/04/07 14:56:35 |   at
org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.ja
va:134) ~[apache-cassandra-3.0.4.jar:3.0.4]

INFO   | jvm 1| 2016/04/07 14:56:35 |   at
org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.j
ava:127) ~[apache-cassandra-3.0.4.jar:3.0.4]

INFO   | jvm 1| 2016/04/07 14:56:35 |   at
org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.j
ava:123) ~[apache-cassandra-3.0.4.jar:3.0.4]

INFO   | jvm 1| 2016/04/07 14:56:35 |   at
org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65
) ~[apache-cassandra-3.0.4.jar:3.0.4]

INFO   | jvm 1| 2016/04/07 14:56:35 |   at
org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289)
~[apache-cassandra-3.0.4.jar:3.0.4]

INFO   | jvm 1| 2016/04/07 14:56:35 |   at
org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(Stor
ageProxy.java:1792) ~[apache-cassandra-3.0.4.jar:3.0.4]

INFO   | jvm 1| 2016/04/07 14:56:35 |   at
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy
.java:2446) ~[apache-cassandra-3.0.4.jar:3.0.4]

INFO   | jvm 1| 2016/04/07 14:56:35 |   ... 4 common frames
omitted

 

Are we still able to do thrift seconday index queries ? Using Cassandra
3.0.4. Same call works fine with Cassandra 2.2.5.

 

Regards:

Ivan



Cassandra experts/consulting in Russia

2016-04-07 Thread Roman Skvazh
Hello guys!
Can you suggest me a consulting company or specialist in Apache Cassandra in 
Russia?
We need to expert support/consult our production clusters.

Thank you!

———
Roman Skvazh



Mapping a continuous range to a discrete value

2016-04-07 Thread Peer, Oded
I have a table mapping continuous ranges to discrete values.

CREATE TABLE range_mapping (k int, lower int, upper int, mapped_value int, 
PRIMARY KEY (k, lower, upper));
INSERT INTO range_mapping (k, lower, upper, mapped_value) VALUES (0, 0, 99, 0);
INSERT INTO range_mapping (k, lower, upper, mapped_value) VALUES (0, 100, 199, 
100);
INSERT INTO range_mapping (k, lower, upper, mapped_value) VALUES (0, 200, 299, 
200);

I then want to query this table to find mapping of a specific value.
In SQL I would use: select mapped_value from range_mapping where k=0 and ? 
between lower and upper

If the variable is bound to the value 150 then the mapped_value returned is 100.

I can't use the same type of query in CQL.
Using the query "select * from range_mapping where k = 0 and lower <= 150 and 
upper >= 150;" returns an error "Clustering column "upper" cannot be restricted 
(preceding column "lower" is restricted by a non-EQ relation)"

I thought of using multi-column restrictions but they don't work as I expected 
as the following query returns two rows instead of the one I expected:

select * from range_mapping where k = 0 and (lower,upper) <= (150,999) and 
(lower,upper) >= (-999,150);

k | lower | upper | mapped_value
---+---+---+--
0 | 0 |99 |0
0 |   100 |   199 |  100

I'd appreciate any thoughts on the subject.



RE: all the nost are not reacheable when running massive deletes

2016-04-07 Thread Paco Trujillo
Well, then you could trying to replace this node as soon as you have more nodes 
available. I would use this procedure as I believe it is the most efficient 
one: 
http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_replace_node_t.html.

It is not always the same node, it is always one node from the seven in the 
cluster which has the high load but not always the same.

Respect to the question of the hardware ( from one of the nodes, all of them 
have the same configuration)

Disk:


-  We use sdd disks

-  Output from iostat -mx 5 100:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   1,000,000,400,030,00   98,57

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda   0,00 0,000,000,20 0,00 0,00 8,00 
0,000,00   0,00   0,00
sdb   0,00 0,000,000,00 0,00 0,00 0,00 
0,000,00   0,00   0,00
sdc   0,00 0,000,000,00 0,00 0,00 0,00 
0,000,00   0,00   0,00
sdd   0,00 0,200,000,40 0,00 0,0012,00 
0,002,50   2,50   0,10


-  Logs, I do not see nothing on the messages log except this:

Apr  3 03:07:01 GT-cassandra7 rsyslogd: [origin software="rsyslogd" 
swVersion="5.8.10" x-pid="1504" x-info="http://www.rsyslog.com;] rsyslogd was 
HUPed
Apr  3 18:24:55 GT-cassandra7 ntpd[1847]: 0.0.0.0 06a8 08 no_sys_peer
Apr  4 06:56:18 GT-cassandra7 ntpd[1847]: 0.0.0.0 06b8 08 no_sys_peer

CPU:


-  General use: 1 – 4 %

-  Worst case: 98% .It is when the problem comes, running massive 
deletes(even in a different machine which is receiving the deletes) or running 
a repair.

RAM:


-  We are using CMS.

-  Each node have 16GB, and we dedicate to Cassandra

o   MAX_HEAP_SIZE="10G"

o   HEAP_NEWSIZE="800M"


Regarding to the rest of questions you mention:


-  Clients: we use the datastax java driver with this configuration:
//Get contact points
  String[] 
contactPoints=this.environment.getRequiredProperty(CASSANDRA_CLUSTER_URL).split(",");
  cluster = com.datastax.driver.core.Cluster.builder()
  .addContactPoints(contactPoints)
  
//.addContactPoint(this.environment.getRequiredProperty(CASSANDRA_CLUSTER_URL))
  
.withCredentials(this.environment.getRequiredProperty(CASSANDRA_CLUSTER_USERNAME),
  
this.environment.getRequiredProperty(CASSANDRA_CLUSTER_PASSWORD))
  .withQueryOptions(new QueryOptions()
  .setConsistencyLevel(ConsistencyLevel.QUORUM))
  //.withLoadBalancingPolicy(new 
TokenAwarePolicy(new DCAwareRoundRobinPolicy(CASSANDRA_PRIMARY_CLUSTER)))
  .withLoadBalancingPolicy(new 
TokenAwarePolicy(new RoundRobinPolicy()))
  //.withLoadBalancingPolicy(new 
TokenAwarePolicy((LoadBalancingPolicy) new RoundRobinBalancingPolicy()))
  .withRetryPolicy(new 
LoggingRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE))
  
.withPort(Integer.parseInt(this.environment.getRequiredProperty(CASSANDRA_CLUSTER_PORT)))
  .build();

So request should be evenly distributed.


-  Deletes are contained in a cql file, and I am using cqlsh to execute 
them. I will try to run the deletes in small batches and separate nodes, but 
same problem appear when running repairs.

I think the problem is related with one specific column family:

CREATE TABLE snpaware.snpsearch (
idline1 bigint,
idline2 bigint,
partid int,
id uuid,
alleles int,
coverage int,
distancetonext int,
distancetonextbyline int,
distancetoprev int,
distancetoprevbyline int,
frequency double,
idindividual bigint,
idindividualmorph bigint,
idreferencebuild bigint,
isinexon boolean,
isinorf boolean,
max_length int,
morphid bigint,
position int,
qualityflag int,
ranking int,
referencebuildlength int,
snpsearchid uuid,
synonymous boolean,
PRIMARY KEY ((idline1, idline2, partid), id)
) WITH CLUSTERING ORDER BY (id ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = 'KEYS_ONLY'
AND comment = 'Table with the snp between lines'
AND compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
   AND compression = {'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND index_interval = 128
AND memtable_flush_period_in_ms = 0
AND populate_io_cache_on_flush = false
AND read_repair_chance = 0.1

RE: nodetool drain running for days

2016-04-07 Thread Paco Trujillo
Hi Jeff

Thanks for your answer.


-  Regarding to the drain, I will proceed as you indicate. Run a flush 
and then shutdown the node.


-  Regarding to the Cassandra updates. I want to upgrade the version of 
the cluster because we are having problems with timeouts (all the nodes became 
unresponsive) when running compactions. In other question I ask in this list to 
find a solution to this problem someone recommended me two things: add new 
nodes to the cluster(I already have ordered two new nodes) and upgrade the 
version of Cassandra because the version 2.0.17 is very old and newer versions, 
especially 2.1 had a lot of improvements in terms of performance (which is 
probably the problem we are facing).



-  We have a demo environment but unafortunately the cluster test does 
not have the same size and we cannot replicate the data from the live cluster 
on the test cluster because of the size of the live data. Anyway in our cases, 
surprises will not cost millions of dolars and certainly not my job. We are 
also a small company, so even if we upgrade the test cluster probably nobody ( 
I mean real users) will test the application who uses the cluster. This means, 
that probably we will not detect the bugs even in the test cluster.


From: Jeff Jirsa [mailto:jeff.ji...@crowdstrike.com]
Sent: woensdag 6 april 2016 19:13
To: user@cassandra.apache.org
Subject: Re: nodetool drain running for days

Drain should not run for days – if it were me, I’d be:

 *   Checking for ‘DRAINED’ in the server logs
 *   Running ‘nodetool flush’ just to explicitly flush the commitlog/memtables 
(generally useful before doing drain, too, it can be somewhat race-y)
 *   Explicitly killing cassandra following the flush – drain should simply be 
a flush+shutdowneverything, so it should take on the order of seconds, not days.


For your question about 3.0: historically, Cassandra has had some bugs in new 
major versions -

Hints were broken from 1.0.0 to 1.0.3 - 
https://issues.apache.org/jira/browse/CASSANDRA-3466
Hints were broken again from 1.1.0 to 1.1.6 - 
https://issues.apache.org/jira/browse/CASSANDRA-4772
There was a corruption bug in 2.0 until 2.0.8 - 
https://issues.apache.org/jira/browse/CASSANDRA-6285
There were a number of rough edges in 2.1, including a memory leak fixed in 
2.1.7 - https://issues.apache.org/jira/browse/CASSANDRA-9549
Compaction kept stopping in 2.2.0 until 2.2.2 - 
https://issues.apache.org/jira/browse/CASSANDRA-10270

Because of this history of “bugs in new versions", many operators choose to 
hold off on going to new versions until they’re “better tested”. The catch-22 
is obvious here: if nobody uses it, nobody tests it in the real world to find 
the bugs not discovered in automated testing. The Datastax folks did some 
awesome work for 3.0 to extend the unit and distributed tests – they’re MUCH 
better than they were in 2.2, so hopefully there are fewer surprise bugs in 3+, 
but there’s bound to be a few. The apache team has also changed the release 
cycle to release more frequently, so that there’s less new code in each release 
(see http://www.planetcassandra.org/blog/cassandra-2-2-3-0-and-beyond/ ).  If 
you’ve got a lab/demo/stage/test environment that can tolerate some outages, I 
definitely encourage you to upgrade there, first. If a few surprise issues will 
cost your company millions of dollars, or will cost you your job, let someone 
else upgrade and be the guinea pig, and don’t upgrade until you’re compelled to 
do so because of a bug fix you need, or a feature that won’t be in the version 
you’re running.



From: Paco Trujillo
Reply-To: "user@cassandra.apache.org"
Date: Tuesday, April 5, 2016 at 11:12 PM
To: "user@cassandra.apache.org"
Subject: nodetool drain running for days

We are having performance problems with our cluster regarding to timeouts when 
repairs are running or massive deletes. One of the advice I received was update 
our casssandra version from 2.0.17 to 2.2. I am draining one of the nodes to 
start the upgrade and the drain is running now for two days. In the logs  only 
see log like these from time to time:

INFO [ScheduledTasks:1] 2016-04-06 08:17:10,987 ColumnFamilyStore.java (line 
808) Enqueuing flush of Memtable-sstable_activity@1382334976(15653/226669 
serialized/live bytes, 6023 ops)
INFO [FlushWriter:1468] 2016-04-06 08:17:10,988 Memtable.java (line 362) 
Writing Memtable-sstable_activity@1382334976(15653/226669 serialized/live 
bytes, 6023 ops)
INFO [ScheduledTasks:1] 2016-04-06 08:17:11,004 ColumnFamilyStore.java (line 
808) Enqueuing flush of Memtable-compaction_history@1425848386(1599/15990 
serialized/live bytes, 51 ops)
INFO [FlushWriter:1468] 2016-04-06 08:17:11,012 Memtable.java (line 402) 
Completed flushing 
/var/lib/cassandra/data/system/sstable_activity/system-sstable_activity-jb-4826-Data.db
 (6348 bytes) for commitlog position