Re: cqlsh problem

2016-03-29 Thread joseph gao
why cassandra using tcp6 for 9042 port like :
tcp6   0  0 0.0.0.0:9042:::*LISTEN
would this be the problem

2016-03-30 11:34 GMT+08:00 joseph gao :

> still have not fixed it . cqlsh: error: no such option: --connect-timeout
> cqlsh version 5.0.1
>
>
>
> 2016-03-25 16:46 GMT+08:00 Alain RODRIGUEZ :
>
>> Hi Joseph.
>>
>> As I can't reproduce here, I believe you are having network issue of some
>> kind.
>>
>> MacBook-Pro:~ alain$ cqlsh --version
>> cqlsh 5.0.1
>> MacBook-Pro:~ alain$ echo 'DESCRIBE KEYSPACES;' | cqlsh
>> --connect-timeout=5 --request-timeout=10
>> system_traces  system
>> MacBook-Pro:~ alain$
>>
>> It's been a few days, did you manage to fix it ?
>>
>> C*heers,
>> ---
>> Alain Rodriguez - al...@thelastpickle.com
>> France
>>
>> The Last Pickle - Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>> 2016-03-21 9:59 GMT+01:00 joseph gao :
>>
>>> cqlsh version 5.0.1. nodetool tpstats looks good, log looks good. And I
>>> used specified port 9042. And it immediately returns fail (less than 3
>>> seconds). By the way where should I use '--connect-timeout', cqlsh seems
>>> don't have such parameters.
>>>
>>> 2016-03-18 17:29 GMT+08:00 Alain RODRIGUEZ :
>>>
 Is the node fully healthy or rejecting some requests ?

 What are the outputs for "grep -i "ERROR"
 /var/log/cassandra/system.log" and "nodetool tpstats"?

 Any error? Any pending / blocked or dropped messages?

 Also did you try using distinct ports (9160 for thrift, 9042 for
 native) - out of curiosity, not sure this will help.

 What is your version of cqlsh "cqlsh --version" ?

 doesn't work most times. But some time it just work fine
>

 Do you fill like this is due to a timeout (query being too big, cluster
 being to busy)? Try setting this higher:

 --connect-timeout=CONNECT_TIMEOUT

 Specify the connection timeout in seconds
 (default: 5 seconds).

   --request-timeout=REQUEST_TIMEOUT

 Specify the default request timeout in seconds 
 (default:
 10 seconds).

 C*heers,
 ---
 Alain Rodriguez - al...@thelastpickle.com
 France

 The Last Pickle - Apache Cassandra Consulting
 http://www.thelastpickle.com

 2016-03-18 4:49 GMT+01:00 joseph gao :

> Of course yes.
>
> 2016-03-17 22:35 GMT+08:00 Vishwas Gupta :
>
>> Have you started the Cassandra service?
>>
>> sh cassandra
>> On 17-Mar-2016 7:59 pm, "Alain RODRIGUEZ"  wrote:
>>
>>> Hi, did you try with the address of the node rather than 127.0.0.1
>>>
>>> Is the transport protocol used by cqlsh (not sure if it is thrift or
>>> binary - native in 2.1)  active ? What is the "nodetool info" output ?
>>>
>>> C*heers,
>>> ---
>>> Alain Rodriguez - al...@thelastpickle.com
>>> France
>>>
>>> The Last Pickle - Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>> 2016-03-17 14:26 GMT+01:00 joseph gao :
>>>
 hi, all
 cassandra version 2.1.7
 When I use cqlsh to connect cassandra, something is wrong

 Connection error: ( Unable to connect to any servers',
 {'127.0.0.1': OperationTimedOut('errors=None, last_host=None,)})

 This happens lots of times, but sometime it works just fine.
 Anybody knows why?

 --
 --
 Joseph Gao
 PhoneNum:15210513582
 QQ: 409343351

>>>
>>>
>
>
> --
> --
> Joseph Gao
> PhoneNum:15210513582
> QQ: 409343351
>


>>>
>>>
>>> --
>>> --
>>> Joseph Gao
>>> PhoneNum:15210513582
>>> QQ: 409343351
>>>
>>
>>
>
>
> --
> --
> Joseph Gao
> PhoneNum:15210513582
> QQ: 409343351
>



-- 
--
Joseph Gao
PhoneNum:15210513582
QQ: 409343351


Inconsistent query results and node state

2016-03-29 Thread Jason Kania
We have encountered a query inconsistency problem wherein the following query 
returns different results sporadically with invalid values for a timestamp 
field looking like the field is uninitialized (a zero timestamp) in the query 
results.

Attempts to repair and compact have not changed the results.

select "subscriberId","sensorUnitId","sensorId","time" from 
"sensorReadingIndex" where "subscriberId"='JASKAN' AND "sensorUnitId"=0 AND 
"sensorId"=0 ORDER BY "time" LIMIT 10;

Invalid Query Results
subscriberId    sensorUnitId    sensorId    time
JASKAN    0    0    2015-05-24 2:09
JASKAN    0    0    1969-12-31 19:00
JASKAN    0    0    2016-01-21 2:10
JASKAN    0    0    2016-01-21 2:10
JASKAN    0    0    2016-01-21 2:10
JASKAN    0    0    2016-01-21 2:11
JASKAN    0    0    2016-01-21 2:22
JASKAN    0    0    2016-01-21 2:22
JASKAN    0    0    2016-01-21 2:22
JASKAN    0    0    2016-01-21 2:22

Valid Query Results
subscriberId    sensorUnitId    sensorId    time
JASKAN    0    0    2015-05-24 2:09
JASKAN    0    0    2015-05-24 2:09
JASKAN    0    0    2015-05-24 2:10
JASKAN    0    0    2015-05-24 2:10
JASKAN    0    0    2015-05-24 2:10
JASKAN    0    0    2015-05-24 2:10
JASKAN    0    0    2015-05-24 2:11
JASKAN    0    0    2015-05-24 2:13
JASKAN    0    0    2015-05-24 2:13
JASKAN    0    0    2015-05-24 2:14

We have confirmed that the 1969-12-31 timestamp is not within the data based on 
running and number of queries so it looks like the invalid timestamp value is 
generated by the query. The query below returns no row.

select * from "sensorReadingIndex" where "subscriberId"='JASKAN' AND 
"sensorUnitId"=0 AND "sensorId"=0 AND time='1969-12-31 19:00:00-0500';

No logs are coming out but the following was observed intermittently in the 
tracing output, but not correlated to the invalid query results:

 Digest mismatch: org.apache.cassandra.service.DigestMismatchException: 
Mismatch for key DecoratedKey(-7563144029910940626, 
00064a41534b414e040400) 
(be22d379c18f75c2f51dd6942d2f9356 vs da4e95d571b41303b908e0c5c3fff7ba) 
[ReadRepairStage:3179] | 2016-03-29 23:12:35.025000 | 192.168.10.10 |
An error from the debug log that might be related is:
org.apache.cassandra.service.DigestMismatchException: Mismatch for key 
DecoratedKey(-4908797801227889951, 4a41534b414e) 
(6a6c8ab013d7757e702af50cbdae045c vs 2ece61a01b2a640ac10509f4c49ae6fb)
    at 
org.apache.cassandra.service.DigestResolver.resolve(DigestResolver.java:85) 
~[apache-cassandra-3.0.3.jar:3.0.3]
    at 
org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:225)
 ~[apache-cassandra-3.0.3.jar:3.0.3]
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_74]
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_74]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_74]

The tracing files are attached and seem to show that in the failed case, 
content is skipped because of tombstones if we understand it correctly. This 
could be an inconsistency problem on 192.168.10.9 Unfortunately, attempts to 
compact on 192.168.10.9 only give the following error without any stack trace 
detail and are not fixed with repair.

root@cutthroat:/usr/local/bin/analyzer/bin# nodetool compact
error: null
-- StackTrace --
java.lang.ArrayIndexOutOfBoundsException
Any suggestions on how to fix or what to search for would be much appreciated.
Thanks,
Jason



Tracing session: 09e26410-f626-11e5-8b85-9b8e819c8182

 activity   

 | timestamp  | 
source| source_elapsed
-++---+


  Execute CQL3 query | 2016-03-29 23:18:13.969000 | 
192.168.10.10 |  0
 Parsing select "subscriberId","sensorUnitId","sensorId","time" from 
"sensorReadingIndex" where "subscriberId"='JASKAN' AND "sensorUnitId"=0 AND 
"sensorId"=0 ORDER BY "time" LIMIT 10; [SharedPool-Worker-2] | 2016-03-29 
23:18:13.97 | 192.168.10.10 |181

READ message received from 
/192.168.10.10 [MessagingService-Incoming-/192.168.10.10] | 2016-03-29 
23:18:13.97 |  192.168.10.9 | 20

Re: cqlsh problem

2016-03-29 Thread joseph gao
still have not fixed it . cqlsh: error: no such option: --connect-timeout
cqlsh version 5.0.1



2016-03-25 16:46 GMT+08:00 Alain RODRIGUEZ :

> Hi Joseph.
>
> As I can't reproduce here, I believe you are having network issue of some
> kind.
>
> MacBook-Pro:~ alain$ cqlsh --version
> cqlsh 5.0.1
> MacBook-Pro:~ alain$ echo 'DESCRIBE KEYSPACES;' | cqlsh
> --connect-timeout=5 --request-timeout=10
> system_traces  system
> MacBook-Pro:~ alain$
>
> It's been a few days, did you manage to fix it ?
>
> C*heers,
> ---
> Alain Rodriguez - al...@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> 2016-03-21 9:59 GMT+01:00 joseph gao :
>
>> cqlsh version 5.0.1. nodetool tpstats looks good, log looks good. And I
>> used specified port 9042. And it immediately returns fail (less than 3
>> seconds). By the way where should I use '--connect-timeout', cqlsh seems
>> don't have such parameters.
>>
>> 2016-03-18 17:29 GMT+08:00 Alain RODRIGUEZ :
>>
>>> Is the node fully healthy or rejecting some requests ?
>>>
>>> What are the outputs for "grep -i "ERROR" /var/log/cassandra/system.log"
>>> and "nodetool tpstats"?
>>>
>>> Any error? Any pending / blocked or dropped messages?
>>>
>>> Also did you try using distinct ports (9160 for thrift, 9042 for native)
>>> - out of curiosity, not sure this will help.
>>>
>>> What is your version of cqlsh "cqlsh --version" ?
>>>
>>> doesn't work most times. But some time it just work fine

>>>
>>> Do you fill like this is due to a timeout (query being too big, cluster
>>> being to busy)? Try setting this higher:
>>>
>>> --connect-timeout=CONNECT_TIMEOUT
>>>
>>> Specify the connection timeout in seconds
>>> (default: 5 seconds).
>>>
>>>   --request-timeout=REQUEST_TIMEOUT
>>>
>>> Specify the default request timeout in seconds 
>>> (default:
>>> 10 seconds).
>>>
>>> C*heers,
>>> ---
>>> Alain Rodriguez - al...@thelastpickle.com
>>> France
>>>
>>> The Last Pickle - Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>> 2016-03-18 4:49 GMT+01:00 joseph gao :
>>>
 Of course yes.

 2016-03-17 22:35 GMT+08:00 Vishwas Gupta :

> Have you started the Cassandra service?
>
> sh cassandra
> On 17-Mar-2016 7:59 pm, "Alain RODRIGUEZ"  wrote:
>
>> Hi, did you try with the address of the node rather than 127.0.0.1
>>
>> Is the transport protocol used by cqlsh (not sure if it is thrift or
>> binary - native in 2.1)  active ? What is the "nodetool info" output ?
>>
>> C*heers,
>> ---
>> Alain Rodriguez - al...@thelastpickle.com
>> France
>>
>> The Last Pickle - Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>> 2016-03-17 14:26 GMT+01:00 joseph gao :
>>
>>> hi, all
>>> cassandra version 2.1.7
>>> When I use cqlsh to connect cassandra, something is wrong
>>>
>>> Connection error: ( Unable to connect to any servers', {'127.0.0.1':
>>> OperationTimedOut('errors=None, last_host=None,)})
>>>
>>> This happens lots of times, but sometime it works just fine. Anybody
>>> knows why?
>>>
>>> --
>>> --
>>> Joseph Gao
>>> PhoneNum:15210513582
>>> QQ: 409343351
>>>
>>
>>


 --
 --
 Joseph Gao
 PhoneNum:15210513582
 QQ: 409343351

>>>
>>>
>>
>>
>> --
>> --
>> Joseph Gao
>> PhoneNum:15210513582
>> QQ: 409343351
>>
>
>


-- 
--
Joseph Gao
PhoneNum:15210513582
QQ: 409343351


Re: How is the coordinator node in LOCAL_QUORUM chosen?

2016-03-29 Thread Eric Stevens
How this works is probably documented in greater detail at the link I
provided before than I can do justice to here.

TokenAware uses its configured child strategy to determine node locality.
DCAwareRoundRobin uses a configuration property, or if all of its seed
nodes are in the same DC it assumes nodes in that DC to be local.
LatencyAware uses latency metrics to determine locality.

LOCAL_XXX consistency, as the name implies is considered satisfied if _XXX
replicas in the coordinator node's local datacenter have acknowledged the
write (or answered for the read).  If your load balancer considers nodes
from multiple datacenters local (i.e. it's shipping queries to nodes that
belong in several DC's), local consistency is considered only against the
local datacenter of the node which is coordinating the query - that is to
say that consistency is not a *driver* level property, but a
*coordinator* level
property that is supplied by the driver.

On Tue, Mar 29, 2016 at 8:01 AM X. F. Li  wrote:

> Thanks for the explanation. My question is
> * How the client driver would determine which cassandra node is considered
> "local". Is it auto discovered (if so, how?) or manually specified
> somewhere?
> * Whether local_xxx consistencies always fail when a partition is not
> replicated in the local DC, as specified in its replication strategy.
>
>  Perhaps I should ask the node.js client authors about this.
>
>
> On Monday, March 28, 2016 07:47 PM, Eric Stevens wrote:
>
> > Local quorum works in the same data center as the coordinator node,
> but when an app server execute the write query, how is the coordinator
> node chosen?
>
> It typically depends on the driver, and decent drivers offer you several
> options for this, usually called load balancing strategy.  You indicate
> that you're using the node.js driver (presumably the DataStax version),
> which is documented here:
> http://docs.datastax.com/en/developer/nodejs-driver/3.0/common/drivers/reference/tuningPolicies.html
>
> I'm not familiar with the node.js driver, but I am familiar with the Java
> driver, and since they use the same terminology RE load balancing, I'll
> assume they work the same.
>
> A typical way to set that up is to use TokenAware policy with
> DCAwareRoundRobinPolicy as its child policy.  This will prefer to route
> queries to the primary replica (or secondary replica if the primary is
> offline) in the local datacenter for that query if it can be discovered
> automatically by the driver, such as with prepared statements.
>
> Where the replica discovery can't be accomplished, TokenAware defers to
> the child policy to choose the host.  In the case of
> DCAwareRoundRobinPolicy that means it iterates through the hosts of the
> configured local datacenter (defaulted to the DC of the seed nodes if
> they're all in the same DC) for each subsequent execution.
>
> On Fri, Mar 25, 2016 at 2:04 PM X. F. Li  wrote:
>
>> Hello,
>>
>> Local quorum works in the same data center as the coordinator node, but
>> when an app server execute the write query, how is the coordinator node
>> chosen?
>>
>> I use the node.js driver. How do the driver client determine which
>> cassandra nodes are in the same DC as the client node? Does it use
>> private network IP [192.168.x.x etc] to auto detect, or must I manually
>> provide a localBalancing policy by `new DCAwareRoundRobinPolicy(
>> localDcName )`?
>>
>> If a partition is not available in the local DC, i.e. if the local
>> replica node fails or all replica nodes are in remote DC, will local
>> quorum fail? If it doesn't fail, there is no guarantee that it all
>> queries on a partition will be directed to the same data center, so does
>> it means strong consistency cannot be expected?
>>
>> Another question:
>>
>> Suppose I have replication factor 3. If one of the node fails, will
>> queries with ALL consistency fail if the queried partition is on the
>> failed node? Or would they continue to work with 2 replicas during the
>> time while cassandra is replicating the partitions on the failed node to
>> re-establish 3 replicas?
>>
>> Thank you.
>> Regards,
>>
>> X. F. Li
>>
>
>


Re: Solr and vnodes anyone?

2016-03-29 Thread Eric Stevens
IIRC in DSE 4.6 using vnodes is basically always a bad idea in your Solr
datacenter.  The overhead was more than you could reasonably want to pay
unless your vnode count was low enough that you lost all the advantage.

Around 4.7 there were significant performance improvements for vnodes in
DSE Solr.  In that era we experimented with several vnode counts, and 64
was where we settled as the best tradeoff between performance degradation
(increase in fanouts in the read path) and cluster management.

I believe there were even more improvements in 4.8, but we have not re-run
our earlier experiments; we're still running at scale with 64 vnodes.

On Mon, Mar 28, 2016 at 8:59 PM Jack Krupansky 
wrote:

> Somebody recently asked me for advice on the use of Solr (DSE Search) and
> vnodes, so I was wondering... is anybody here actually using Solr/DSE
> Search with vnodes enabled? If so, with what token count? The default of
> 256 would result in somewhat suboptimal query performance, so the question
> is whether 64 or even 32 would deliver acceptable query performance?
> Anybody here have any practical experience on this issue, either testing or
> even better, in production?
>
> Absent any further input, my advice would be to limit DSE Search/Solr to a
> token count of 64 per node.
>
> -- Jack Krupansky
>


Re: Does saveToCassandra work with Cassandra Lucene plugin ?

2016-03-29 Thread Cleosson José Pirani de Souza
Hi Eduardo,


 It works. I think 
SPARKC-332 is fine.


Thanks,

Cleosson



From: Eduardo Alonso 
Sent: Tuesday, March 29, 2016 11:39 AM
To: user@cassandra.apache.org
Subject: Re: Does saveToCassandra work with Cassandra Lucene plugin ?

Hi,

It seems that the problem is caused by a problem in the Cassandra Spark driver, 
and not in the plugin.

Since CASSANDRA-10217  
Cassandra 3.x per-row indexes don't require to be created on a fake column 
anymore. Thus, from Cassandra 3.x the "CREATE CUSTOM INDEX %s ON %s(%s)" 
column-based syntax is replaced with the new "CREATE CUSTOM INDEX %s ON %s()" 
row-based syntax.  However, DataStax Spark driver doesn't seem to support this 
new feature yet.

When "com.datastax.spark.connector.RDDFunctions.saveToCassandra" is called it 
tries to load the table schema and the index schema related to a table column. 
Since this new index syntax does not have the fake-column anymore it results in 
a NoSuchElementException due to an empty column name.

However, saveToCassandra works well if you execute the same example with prior 
fake column syntax:

CREATE KEYSPACE demo
WITH REPLICATION = {'class' : 'SimpleStrategy', 'replication_factor': 1};
USE demo;
CREATE TABLE tweets (
id INT PRIMARY KEY,
user TEXT,
body TEXT,
time TIMESTAMP,
latitude FLOAT,
longitude FLOAT,
lucene TEXT
);

CREATE CUSTOM INDEX tweets_index ON tweets (lucene)
USING 'com.stratio.cassandra.lucene.Index'
WITH OPTIONS = {
'refresh_seconds' : '1',
'schema' : '{
fields : {
id: {type : "integer"},
user  : {type : "string"},
body  : {type : "text", analyzer : "english"},
time  : {type : "date", pattern : "/MM/dd", sorted : true},
place : {type : "geo_point", latitude:"latitude", 
longitude:"longitude"}
}
}'
};

Should we open a new JIRA about this or extend 
SPARKC-332 ?

Regards

Eduardo Alonso
[https://admin.google.com/a/cpanel/stratio.com/images/logo-custom.gif]
Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 828 6473 // www.stratio.com // 
@stratiobd

2016-03-29 14:02 GMT+02:00 Cleosson José Pirani de Souza 
>:

Hi Eduardo,


 As I was not sure that is a bug, I preferred to send the e-mail to list first. 
It could be something was done wrong.

 The versions are:

  *   Spark 1.6.0
  *   Cassandra 3.0.3
  *   Lucene plugin 3.0.3.1

 I opened the bug. The link 
https://github.com/Stratio/cassandra-lucene-index/issues/109

 If it is not a bug, let me know.


Thanks,

Cleosson



From: Eduardo Alonso 
>
Sent: Tuesday, March 29, 2016 6:57 AM

To: user@cassandra.apache.org
Subject: Re: Does saveToCassandra work with Cassandra Lucene plugin ?


Hi Cleosson Jose,

First of all, if you think this is a caused by a 
cassandra-lucene-index bug, 
this user list is not the best way to report it. Please use github 
issues for this.

Second, in order to reproduce this error,  i need to know which versions of 
cassandra, cassandra-lucene-index, spark and spark-cassandra-connector you are 
using

Regards

Eduardo Alonso
[https://admin.google.com/a/cpanel/stratio.com/images/logo-custom.gif]
Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 828 6473 // www.stratio.com // 
@stratiobd

2016-03-28 23:43 GMT+02:00 Cleosson José Pirani de Souza 
>:

Hi Jack,


 Yes, I used the exact same commands in the Stratio readme.


Thanks,

Cleososn



From: Jack Krupansky >
Sent: Monday, March 28, 2016 6:06 PM
To: user@cassandra.apache.org

Subject: Re: Does saveToCassandra work with Cassandra Lucene plugin ?

The exception message has an empty column name. Odd. Not sure if that is a bug 
in the exception code or whether you actually have an empty column name 
somewhere.

Did you use the absolutely exact same commands to create the keyspace, table, 
and custom index as in the Stratio readme?

-- Jack Krupansky

On Mon, Mar 28, 2016 at 4:57 PM, Cleosson José Pirani de Souza 
> wrote:

Hi,

 One important thing, if I remove the custom index using Lucene, 
saveToCassandra works.


Thanks

Cleosson


Re: Drop and Add column with different datatype in Cassandra

2016-03-29 Thread Tyler Hobbs
On Tue, Mar 29, 2016 at 10:31 AM, Bhupendra Baraiya <
bhupendra.bara...@continuum.net> wrote:

> Does it mean Cassandra does not allow adding of the same column in the
> Table even though it does not exists in the Table
>

As the error message says, you can't re-add a *collection* column with the
same name.  Other types of columns are fine.


-- 
Tyler Hobbs
DataStax 


Drop and Add column with different datatype in Cassandra

2016-03-29 Thread Bhupendra Baraiya
Hi ,

I am using Cassandra 3.0.4
I am using a Table called User which has a column last_login with Map datatype .

I dropped the column using the  syntax Alter Table User Drop last_login;

I dropped the column from table and recreated it with a nested datatype 
last_login map>>
But then I got an below error

InvalidRequest: code=2200 [Invalid query] message="Cannot add a collection with
the name last_login because a collection with the same name and a different type
(map) has already been used in the past"

Does it mean Cassandra does not allow adding of the same column in the Table 
even though it does not exists in the Table

Thanks and regards,

Bhupendra Baraiya
Continuum Managed Services, LLC.
p: 902-933-0019
e: bhupendra.bara...@continuum.net
w: continuum.net
[http://cdn2.hubspot.net/hub/281750/file-393087232-png/img/logos/email-continuum-logo-151x26.png]



Re: Does saveToCassandra work with Cassandra Lucene plugin ?

2016-03-29 Thread Eduardo Alonso
Hi,

It seems that the problem is caused by a problem in the Cassandra Spark
driver, and not in the plugin.

Since CASSANDRA-10217
  Cassandra 3.x
per-row indexes don't require to be created on a fake column anymore. Thus,
from Cassandra 3.x the "*CREATE CUSTOM INDEX %s ON %s(%s)*" column-based
syntax is replaced with the new "*CREATE CUSTOM INDEX %s ON %s()*"
row-based syntax.  However, DataStax Spark driver doesn't seem to support
this new feature yet.

When "com.datastax.spark.connector.RDDFunctions.saveToCassandra" is called
it tries to load the table schema and the index schema related to a table
column. Since this new index syntax does not have the fake-column anymore
it results in a NoSuchElementException due to an empty column name.

However, saveToCassandra works well if you execute the same example with
prior fake column syntax:












*CREATE KEYSPACE demoWITH REPLICATION = {'class' : 'SimpleStrategy',
'replication_factor': 1};USE demo;CREATE TABLE tweets (id INT PRIMARY
KEY,user TEXT,body TEXT,time TIMESTAMP,latitude FLOAT,
longitude FLOAT,lucene TEXT);*














*CREATE CUSTOM INDEX tweets_index ON tweets (lucene)USING
'com.stratio.cassandra.lucene.Index'WITH OPTIONS = {'refresh_seconds' :
'1','schema' : '{fields : {id: {type :
"integer"},user  : {type : "string"},body  : {type
: "text", analyzer : "english"},time  : {type : "date", pattern
: "/MM/dd", sorted : true},place : {type : "geo_point",
latitude:"latitude", longitude:"longitude"}}}'};*

Should we open a new JIRA about this or extend SPARKC-332
 ?

Regards

Eduardo Alonso
Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd
*

2016-03-29 14:02 GMT+02:00 Cleosson José Pirani de Souza <
cso...@daitangroup.com>:

> Hi Eduardo,
>
>
>  As I was not sure that is a bug, I preferred to send the e-mail to list
> first. It could be something was done wrong.
>
>  The versions are:
>
>- Spark 1.6.0
>- Cassandra 3.0.3
>- Lucene plugin 3.0.3.1
>
>  I opened the bug. The link
> https://github.com/Stratio/cassandra-lucene-index/issues/109
>
>  If it is not a bug, let me know.
>
>
> Thanks,
>
> Cleosson
>
>
> --
> *From:* Eduardo Alonso 
> *Sent:* Tuesday, March 29, 2016 6:57 AM
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: Does saveToCassandra work with Cassandra Lucene plugin ?
>
>
> Hi Cleosson Jose,
>
> First of all, if you think this is a caused by a cassandra-lucene-index
>  bug, this user list
> is not the best way to report it. Please use github issues
>  for this.
>
> Second, in order to reproduce this error,  i need to know which versions
> of cassandra, cassandra-lucene-index, spark and spark-cassandra-connector
> you are using
>
> Regards
>
> Eduardo Alonso
> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd
> *
>
> 2016-03-28 23:43 GMT+02:00 Cleosson José Pirani de Souza <
> cso...@daitangroup.com>:
>
>> Hi Jack,
>>
>>
>>  Yes, I used the exact same commands in the Stratio readme.
>>
>>
>> Thanks,
>>
>> Cleososn
>>
>>
>> --
>> *From:* Jack Krupansky 
>> *Sent:* Monday, March 28, 2016 6:06 PM
>> *To:* user@cassandra.apache.org
>>
>> *Subject:* Re: Does saveToCassandra work with Cassandra Lucene plugin ?
>>
>> The exception message has an empty column name. Odd. Not sure if that is
>> a bug in the exception code or whether you actually have an empty column
>> name somewhere.
>>
>> Did you use the absolutely exact same commands to create the keyspace,
>> table, and custom index as in the Stratio readme?
>>
>> -- Jack Krupansky
>>
>> On Mon, Mar 28, 2016 at 4:57 PM, Cleosson José Pirani de Souza <
>> cso...@daitangroup.com> wrote:
>>
>>> Hi,
>>>
>>>  One important thing, if I remove the custom index using Lucene,
>>> saveToCassandra works.
>>>
>>>
>>> Thanks
>>>
>>> Cleosson
>>>
>>>
>>> --
>>> *From:* Anuj Wadehra 
>>> *Sent:* Monday, March 28, 2016 3:27 PM
>>> *To:* user@cassandra.apache.org; Cleosson José Pirani de Souza;
>>> user@cassandra.apache.org
>>> *Subject:* Re: Does saveToCassandra work with Cassandra Lucene plugin ?
>>>
>>> I used it with Java and there, every field of Pojo must map to column
>>> names of the table. I think someone with Scala syntax knowledge can help
>>> you better.
>>>
>>>
>>> Thanks
>>> Anuj
>>>
>>> Sent from Yahoo Mail on Android
>>> 

Re: How is the coordinator node in LOCAL_QUORUM chosen?

2016-03-29 Thread X. F. Li

Thanks for the explanation. My question is
* How the client driver would determine which cassandra node is 
considered "local". Is it auto discovered (if so, how?) or manually 
specified somewhere?
* Whether local_xxx consistencies always fail when a partition is not 
replicated in the local DC, as specified in its replication strategy.


 Perhaps I should ask the node.js client authors about this.

On Monday, March 28, 2016 07:47 PM, Eric Stevens wrote:
> Local quorum works in the same data center as the coordinator node, 
but when an app server execute the write query, how is the coordinator 
node chosen?


It typically depends on the driver, and decent drivers offer you 
several options for this, usually called load balancing strategy.  You 
indicate that you're using the node.js driver (presumably the DataStax 
version), which is documented here: 
http://docs.datastax.com/en/developer/nodejs-driver/3.0/common/drivers/reference/tuningPolicies.html


I'm not familiar with the node.js driver, but I am familiar with the 
Java driver, and since they use the same terminology RE load 
balancing, I'll assume they work the same.


A typical way to set that up is to use TokenAware policy with 
DCAwareRoundRobinPolicy as its child policy.  This will prefer to 
route queries to the primary replica (or secondary replica if the 
primary is offline) in the local datacenter for that query if it can 
be discovered automatically by the driver, such as with prepared 
statements.


Where the replica discovery can't be accomplished, TokenAware defers 
to the child policy to choose the host.  In the case of 
DCAwareRoundRobinPolicy that means it iterates through the hosts of 
the configured local datacenter (defaulted to the DC of the seed nodes 
if they're all in the same DC) for each subsequent execution.


On Fri, Mar 25, 2016 at 2:04 PM X. F. Li > wrote:


Hello,

Local quorum works in the same data center as the coordinator
node, but
when an app server execute the write query, how is the coordinator
node
chosen?

I use the node.js driver. How do the driver client determine which
cassandra nodes are in the same DC as the client node? Does it use
private network IP [192.168.x.x etc] to auto detect, or must I
manually
provide a localBalancing policy by `new DCAwareRoundRobinPolicy(
localDcName )`?

If a partition is not available in the local DC, i.e. if the local
replica node fails or all replica nodes are in remote DC, will local
quorum fail? If it doesn't fail, there is no guarantee that it all
queries on a partition will be directed to the same data center,
so does
it means strong consistency cannot be expected?

Another question:

Suppose I have replication factor 3. If one of the node fails, will
queries with ALL consistency fail if the queried partition is on the
failed node? Or would they continue to work with 2 replicas during the
time while cassandra is replicating the partitions on the failed
node to
re-establish 3 replicas?

Thank you.
Regards,

X. F. Li





Re: Acceptable repair time

2016-03-29 Thread Kai Wang
IIRC when we switched to LCS and ran the first full repair with 250GB/RF=3,
it took at least 12 hours for the repair to finish, then another 3+ days
for all the compaction to catch up. I called it "the big bang of LCS".

Since then we've been running nightly incremental repair.

For me as long as it's reliable (no streaming error, better progress
reporting etc), I actually don't mind it it takes more than a few hours to
do a full repair. But I am not sure about 4 days... I guess it depends on
the size of the cluster and data...

On Tue, Mar 29, 2016 at 6:04 AM, Anishek Agarwal  wrote:

> I would really like to know the answer for above because on some nodes
> repair takes almost 4 days for us :(.
>
> On Tue, Mar 29, 2016 at 8:34 AM, Jack Krupansky 
> wrote:
>
>> Someone recently asked me for advice when their repair time was 2-3 days.
>> I thought that was outrageous, but not unheard of. Personally, to me, 2-3
>> hours would be about the limit of what I could tolerate, and my personal
>> goal would be that a full repair of a node should take no longer than an
>> hour, maybe 90 minutes tops. But... achieving those more abbreviated repair
>> times would strongly suggest that the amount of data on each node be kept
>> down to a tiny fraction of a typical spinning disk drive, or even a
>> fraction of a larger SSD drive.
>>
>> So, my question here is what people consider acceptable full repair times
>> for nodes and what the resulting node data size is.
>>
>> What impact vnodes has on these numbers is a bonus question.
>>
>> Thanks!
>>
>> -- Jack Krupansky
>>
>
>


Re: Does saveToCassandra work with Cassandra Lucene plugin ?

2016-03-29 Thread Cleosson José Pirani de Souza
Hi Eduardo,


 As I was not sure that is a bug, I preferred to send the e-mail to list first. 
It could be something was done wrong.

 The versions are:

  *   Spark 1.6.0
  *   Cassandra 3.0.3
  *   Lucene plugin 3.0.3.1

 I opened the bug. The link 
https://github.com/Stratio/cassandra-lucene-index/issues/109

 If it is not a bug, let me know.


Thanks,

Cleosson



From: Eduardo Alonso 
Sent: Tuesday, March 29, 2016 6:57 AM
To: user@cassandra.apache.org
Subject: Re: Does saveToCassandra work with Cassandra Lucene plugin ?


Hi Cleosson Jose,

First of all, if you think this is a caused by a 
cassandra-lucene-index bug, 
this user list is not the best way to report it. Please use github 
issues for this.

Second, in order to reproduce this error,  i need to know which versions of 
cassandra, cassandra-lucene-index, spark and spark-cassandra-connector you are 
using

Regards

Eduardo Alonso
[https://admin.google.com/a/cpanel/stratio.com/images/logo-custom.gif]
Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 828 6473 // www.stratio.com // 
@stratiobd

2016-03-28 23:43 GMT+02:00 Cleosson José Pirani de Souza 
>:

Hi Jack,


 Yes, I used the exact same commands in the Stratio readme.


Thanks,

Cleososn



From: Jack Krupansky >
Sent: Monday, March 28, 2016 6:06 PM
To: user@cassandra.apache.org

Subject: Re: Does saveToCassandra work with Cassandra Lucene plugin ?

The exception message has an empty column name. Odd. Not sure if that is a bug 
in the exception code or whether you actually have an empty column name 
somewhere.

Did you use the absolutely exact same commands to create the keyspace, table, 
and custom index as in the Stratio readme?

-- Jack Krupansky

On Mon, Mar 28, 2016 at 4:57 PM, Cleosson José Pirani de Souza 
> wrote:

Hi,

 One important thing, if I remove the custom index using Lucene, 
saveToCassandra works.


Thanks

Cleosson



From: Anuj Wadehra >
Sent: Monday, March 28, 2016 3:27 PM
To: user@cassandra.apache.org; Cleosson José 
Pirani de Souza; user@cassandra.apache.org
Subject: Re: Does saveToCassandra work with Cassandra Lucene plugin ?

I used it with Java and there, every field of Pojo must map to column names of 
the table. I think someone with Scala syntax knowledge can help you better.


Thanks
Anuj

Sent from Yahoo Mail on 
Android

On Mon, 28 Mar, 2016 at 11:47 pm, Anuj Wadehra
> wrote:
With my limited experience with Spark, I can tell you that you need to make 
sure that all columns mentioned in somecolumns must be part of CQL schema of 
table.


Thanks
Anuj

Sent from Yahoo Mail on 
Android

On Mon, 28 Mar, 2016 at 11:38 pm, Cleosson José Pirani de Souza
> wrote:


Hello,




I am implementing the example on the github 
(https://github.com/Stratio/cassandra-lucene-index) and when I try to save the 
data using saveToCassandra I get the exception NoSuchElementException.
 If I use CassandraConnector.withSessionDo I am able to add elements into 
Cassandra and no exception is raised.


 The code :
import org.apache.spark.{SparkConf, SparkContext, Logging}
import com.datastax.spark.connector.cql.CassandraConnector
import com.datastax.spark.connector._

object App extends Logging{
def main(args: Array[String]) {

// Get the cassandra IP and create the spark context
val cassandraIP = System.getenv("CASSANDRA_IP");
val sparkConf = new SparkConf(true)
.set("spark.cassandra.connection.host", cassandraIP)
.set("spark.cleaner.ttl", "3600")
.setAppName("Simple Spark Cassandra Example")


val sc = new SparkContext(sparkConf)

// Works
CassandraConnector(sparkConf).withSessionDo { session =>
   session.execute("INSERT INTO demo.tweets(id, user, body, time, 
latitude, longitude) VALUES (19, 'Name', 'Body', '2016-03-19 09:00:00-0300', 
39, 39)")
}

// Does not work
val demo = sc.parallelize(Seq((9, "Name", "Body", "2016-03-29 
19:00:00-0300", 29, 29)))
// Raises the exception
demo.saveToCassandra("demo", "tweets", SomeColumns("id", "user", 
"body", "time", 

Re: Acceptable repair time

2016-03-29 Thread Anishek Agarwal
I would really like to know the answer for above because on some nodes
repair takes almost 4 days for us :(.

On Tue, Mar 29, 2016 at 8:34 AM, Jack Krupansky 
wrote:

> Someone recently asked me for advice when their repair time was 2-3 days.
> I thought that was outrageous, but not unheard of. Personally, to me, 2-3
> hours would be about the limit of what I could tolerate, and my personal
> goal would be that a full repair of a node should take no longer than an
> hour, maybe 90 minutes tops. But... achieving those more abbreviated repair
> times would strongly suggest that the amount of data on each node be kept
> down to a tiny fraction of a typical spinning disk drive, or even a
> fraction of a larger SSD drive.
>
> So, my question here is what people consider acceptable full repair times
> for nodes and what the resulting node data size is.
>
> What impact vnodes has on these numbers is a bonus question.
>
> Thanks!
>
> -- Jack Krupansky
>


Re: Does saveToCassandra work with Cassandra Lucene plugin ?

2016-03-29 Thread Eduardo Alonso
Hi Cleosson Jose,

First of all, if you think this is a caused by a cassandra-lucene-index
 bug, this user list is
not the best way to report it. Please use github issues
 for this.

Second, in order to reproduce this error,  i need to know which versions of
cassandra, cassandra-lucene-index, spark and spark-cassandra-connector you
are using

Regards

Eduardo Alonso
Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd
*

2016-03-28 23:43 GMT+02:00 Cleosson José Pirani de Souza <
cso...@daitangroup.com>:

> Hi Jack,
>
>
>  Yes, I used the exact same commands in the Stratio readme.
>
>
> Thanks,
>
> Cleososn
>
>
> --
> *From:* Jack Krupansky 
> *Sent:* Monday, March 28, 2016 6:06 PM
> *To:* user@cassandra.apache.org
>
> *Subject:* Re: Does saveToCassandra work with Cassandra Lucene plugin ?
>
> The exception message has an empty column name. Odd. Not sure if that is a
> bug in the exception code or whether you actually have an empty column name
> somewhere.
>
> Did you use the absolutely exact same commands to create the keyspace,
> table, and custom index as in the Stratio readme?
>
> -- Jack Krupansky
>
> On Mon, Mar 28, 2016 at 4:57 PM, Cleosson José Pirani de Souza <
> cso...@daitangroup.com> wrote:
>
>> Hi,
>>
>>  One important thing, if I remove the custom index using Lucene,
>> saveToCassandra works.
>>
>>
>> Thanks
>>
>> Cleosson
>>
>>
>> --
>> *From:* Anuj Wadehra 
>> *Sent:* Monday, March 28, 2016 3:27 PM
>> *To:* user@cassandra.apache.org; Cleosson José Pirani de Souza;
>> user@cassandra.apache.org
>> *Subject:* Re: Does saveToCassandra work with Cassandra Lucene plugin ?
>>
>> I used it with Java and there, every field of Pojo must map to column
>> names of the table. I think someone with Scala syntax knowledge can help
>> you better.
>>
>>
>> Thanks
>> Anuj
>>
>> Sent from Yahoo Mail on Android
>> 
>>
>> On Mon, 28 Mar, 2016 at 11:47 pm, Anuj Wadehra
>>  wrote:
>> With my limited experience with Spark, I can tell you that you need to
>> make sure that all columns mentioned in somecolumns must be part of CQL
>> schema of table.
>>
>>
>> Thanks
>> Anuj
>>
>> Sent from Yahoo Mail on Android
>> 
>>
>> On Mon, 28 Mar, 2016 at 11:38 pm, Cleosson José Pirani de Souza
>>  wrote:
>>
>>
>>
>> Hello,
>>
>>
>>
>> I am implementing the example on the github (
>> https://github.com/Stratio/cassandra-lucene-index) and when I try to
>> save the data using saveToCassandra I get the exception
>> NoSuchElementException.
>>  If I use CassandraConnector.withSessionDo I am able to add elements into
>> Cassandra and no exception is raised.
>>
>>
>>  The code :
>> import org.apache.spark.{SparkConf, SparkContext, Logging}
>> import com.datastax.spark.connector.cql.CassandraConnector
>> import com.datastax.spark.connector._
>>
>> object App extends Logging{
>> def main(args: Array[String]) {
>>
>> // Get the cassandra IP and create the spark context
>> val cassandraIP = System.getenv("CASSANDRA_IP");
>> val sparkConf = new SparkConf(true)
>> .set("spark.cassandra.connection.host",
>> cassandraIP)
>> .set("spark.cleaner.ttl", "3600")
>> .setAppName("Simple Spark Cassandra Example")
>>
>>
>> *val sc = new SparkContext(sparkConf)*
>>
>> *// Works*
>> *CassandraConnector(sparkConf).withSessionDo { session =>*
>> *   session.execute("INSERT INTO demo.tweets(id, user, body,
>> time, latitude, longitude) VALUES (19, 'Name', 'Body', '2016-03-19
>> 09:00:00-0300', 39, 39)")*
>> *}*
>>
>> *// Does not work*
>> *val demo = sc.parallelize(Seq((9, "Name", "Body", "2016-03-29
>> 19:00:00-0300", 29, 29)))*
>> *// Raises the exception*
>> *demo.saveToCassandra("demo", "tweets", SomeColumns("id", "user",
>> "body", "time", "latitude", "longitude"))*
>>
>>
>> *} *
>> *}*
>>
>>
>>
>>
>>  The exception:
>> *16/03/28 14:15:41 INFO CassandraConnector: Connected to Cassandra
>> cluster: Test Cluster*
>> *Exception in thread "main" java.util.NoSuchElementException: Column  not
>> found in demo.tweets*
>> at
>> com.datastax.spark.connector.cql.StructDef$$anonfun$columnByName$2.apply(Schema.scala:60)
>> at
>> com.datastax.spark.connector.cql.StructDef$$anonfun$columnByName$2.apply(Schema.scala:60)
>> at scala.collection.Map$WithDefault.default(Map.scala:52)
>> at scala.collection.MapLike$class.apply(MapLike.scala:141)
>> at scala.collection.AbstractMap.apply(Map.scala:58)
>> at
>> 

RE: How many nodes do we require

2016-03-29 Thread Jacques-Henri Berthemet
Because if you lose a node you have chances to lose some data forever if it was 
not yet replicated.

--
Jacques-Henri Berthemet

From: Jonathan Haddad [mailto:j...@jonhaddad.com]
Sent: vendredi 25 mars 2016 19:37
To: user@cassandra.apache.org
Subject: Re: How many nodes do we require

Why would using CL-ONE make your cluster fragile? This isn't obvious to me. 
It's the most practical setting for high availability, which very much says 
"not fragile".
On Fri, Mar 25, 2016 at 10:44 AM Jacques-Henri Berthemet 
>
 wrote:
I found this calculator very convenient:
http://www.ecyrd.com/cassandracalculator/

Regardless of your other DCs you need RF=3 if you write at LOCAL_QUORUM, RF=2 
if you write/read at ONE.

Obviously using ONE as CL makes your cluster very fragile.
--
Jacques-Henri Berthemet


-Original Message-
From: Rakesh Kumar 
[mailto:rakeshkumar46...@gmail.com]
Sent: vendredi 25 mars 2016 18:14
To: user@cassandra.apache.org
Subject: Re: How many nodes do we require

On Fri, Mar 25, 2016 at 11:45 AM, Jack Krupansky
> wrote:
> It depends on how much data you have. A single node can store a lot of data,
> but the more data you have the longer a repair or node replacement will
> take. How long can you tolerate for a full repair or node replacement?

At this time, for a foreseeable future, size of data will not be
significant. So we can safely disregard the above as a decision
factor.

>
> Generally, RF=3 is both sufficient and recommended.

Are you telling a SimpleReplication topology with RF=3
or NetworkTopology with RF=3.


taken from:

https://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureDataDistributeReplication_c.html

"
Three replicas in each data center: This configuration tolerates
either the failure of a one node per replication group at a strong
consistency level of LOCAL_QUORUM or multiple node failures per data
center using consistency level ONE."

In our case, with only 3 nodes in each DC, wouldn't a RF=3 effectively mean ALL.

I will state our requirement clearly:

If we are going with six nodes (3 in each DC), we should be able to
write even with a loss of one DC and loss of one node of the surviving
DC. I am open to hearing what compromise we have to do with the reads
during the time a DC is down. For us write is critical, more than
reads.

May be this is not possible with 6 nodes, and requires more.  Pls advise.