RE: [External] Re: Cassandra ad hoc search options

2017-01-30 Thread Yu, John
Thanks. I thought you have given up Lucene for Spark, but it seems your Lucene 
still works.

Spark also has a Cassandra connector, and my questions were more towards that.
From 
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/3_selection.md,
 it seems there’re limitations on how much one can select the data to support 
ad hoc queries. It seems mostly limited to clustering columns. Maybe in other 
cases, it would result in full scan, but that’s going to be very slow.

Regards,
John

From: siddharth verma [mailto:sidd.verma29.l...@gmail.com]
Sent: Monday, January 30, 2017 10:20 PM
To: user@cassandra.apache.org
Subject: Re: [External] Re: Cassandra ad hoc search options

Hi,
Are you using the DataStax connector as well?
Yes, we used it to query on lucene index.

Does it support querying against any column well (not just clustering columns)?
Yes it does. We used lucene particularly for this purpose.
( You can use :
1. 
https://github.com/Stratio/cassandra-lucene-index/blob/branch-3.0.10/doc/documentation.rst#searching
2. https://www.youtube.com/watch?v=Hg5s-hXy_-M
for more details)

I’m wondering how it could build the index around them “on-the-fly”
You can build indexes at run time, but it takes time(took a lot of time on our 
cluster. Plus, CPU utilization went through the roof)

did you use Spark for the full set of data or just partial
We weren't allowed to install spark ( tech decision)
Some tech discussions going around for the bulk job ecosystem.

Hence as a work around, we used a faster scan utility.
For all the adhoc purposes/scripts, you could do a full scan.

I hope it helps.

Regards


On Tue, Jan 31, 2017 at 4:11 AM, Yu, John 
> wrote:
A follow up question is: did you use Spark for the full set of data or just 
partial? In our case, I feel we need all the data to support ad hoc queries 
(with multiple conditional filters).

Thanks,
John

From: Yu, John [mailto:john...@sandc.com]
Sent: Monday, January 30, 2017 12:04 AM
To: user@cassandra.apache.org
Subject: RE: [External] Re: Cassandra ad hoc search options

Thanks for the input! Are you using the DataStax connector as well? Does it 
support querying against any column well (not just clustering columns)? I’m 
wondering how it could build the index around them “on-the-fly”.

Regards,
John

From: siddharth verma [mailto:sidd.verma29.l...@gmail.com]
Sent: Friday, January 27, 2017 12:15 AM
To: user@cassandra.apache.org
Subject: Re: [External] Re: Cassandra ad hoc search options

Hi
We used lucene stratio plugin with C*3.0.3

Helped to solve a lot of some read patterns. Served well for prefix.
But created problems as repairs failed repeatedly.
We might have used it sub optimally, not sure.

Later, we had to do away with it, and tried to serve most of the read patterns 
with materialised views. (currently C*3.0.9)

Currently, for adhoc querries, we use spark or full scan.

Regards,

On Fri, Jan 27, 2017 at 1:03 PM, Yu, John 
> wrote:
Thanks a lot. Mind sharing a couple of points where you feel it’s better than 
the alternatives.

Regards,
John

From: Jonathan Haddad [mailto:j...@jonhaddad.com]
Sent: Thursday, January 26, 2017 2:33 PM
To: user@cassandra.apache.org
Subject: [External] Re: Cassandra ad hoc search options

> With Cassandra, what are the options for ad hoc query/search similar to RDBMS?

Your best options are Spark w/ the DataStax connector or Presto.  Cassandra 
isn't built for ad-hoc queries so you need to use other tools to make it work.

On Thu, Jan 26, 2017 at 2:22 PM Yu, John 
> wrote:
Hi All,

Hope I can get some help here. We’re using Cassandra for services, and recently 
we’re adding UI support.
With Cassandra, what are the options for ad hoc query/search similar to RDBMS? 
We love the features of Cassandra but it seems it’s a known “weakness” that it 
doesn’t come with strong support of indexing and ad hoc queries. There’re some 
recent development with SASI as part of secondary index. However I heard from a 
video where it says it shall not be extensively used.

Has anyone have much experience with SASI? How does it compare to Lucene plugin?
What is the direction of Apache Cassandra in the search area?

We’re also looking into Solr or ElasticSearch integration, but it seems it 
might take more efforts, and possibly involve data duplication.
For Solr, we don’t have DSE.
Sorry if this has been asked before, but I haven’t seen a more complete answer.

Thanks!
John

NOTICE OF CONFIDENTIALITY:
This message may contain information that is considered confidential and which 
may be prohibited from disclosure under applicable law or by contractual 
agreement. The information is intended solely for the 

RE: [External] Re: Cassandra ad hoc search options

2017-01-30 Thread Yu, John
Does this work with Cassandra, or provide an alternative? Thanks.

From: vincent gromakowski [mailto:vincent.gromakow...@gmail.com]
Sent: Monday, January 30, 2017 11:38 PM
To: user@cassandra.apache.org
Subject: Re: [External] Re: Cassandra ad hoc search options


I gave a try on spark+filodb and it's very interesting for ad-hoc queries

Le 31 janv. 2017 7:20 AM, "siddharth verma" 
> a écrit :
Hi,
Are you using the DataStax connector as well?
Yes, we used it to query on lucene index.

Does it support querying against any column well (not just clustering columns)?
Yes it does. We used lucene particularly for this purpose.
( You can use :
1. 
https://github.com/Stratio/cassandra-lucene-index/blob/branch-3.0.10/doc/documentation.rst#searching
2. https://www.youtube.com/watch?v=Hg5s-hXy_-M
for more details)

I’m wondering how it could build the index around them “on-the-fly”
You can build indexes at run time, but it takes time(took a lot of time on our 
cluster. Plus, CPU utilization went through the roof)

did you use Spark for the full set of data or just partial
We weren't allowed to install spark ( tech decision)
Some tech discussions going around for the bulk job ecosystem.

Hence as a work around, we used a faster scan utility.
For all the adhoc purposes/scripts, you could do a full scan.

I hope it helps.

Regards


On Tue, Jan 31, 2017 at 4:11 AM, Yu, John 
> wrote:
A follow up question is: did you use Spark for the full set of data or just 
partial? In our case, I feel we need all the data to support ad hoc queries 
(with multiple conditional filters).

Thanks,
John

From: Yu, John [mailto:john...@sandc.com]
Sent: Monday, January 30, 2017 12:04 AM
To: user@cassandra.apache.org
Subject: RE: [External] Re: Cassandra ad hoc search options

Thanks for the input! Are you using the DataStax connector as well? Does it 
support querying against any column well (not just clustering columns)? I’m 
wondering how it could build the index around them “on-the-fly”.

Regards,
John

From: siddharth verma [mailto:sidd.verma29.l...@gmail.com]
Sent: Friday, January 27, 2017 12:15 AM
To: user@cassandra.apache.org
Subject: Re: [External] Re: Cassandra ad hoc search options

Hi
We used lucene stratio plugin with C*3.0.3

Helped to solve a lot of some read patterns. Served well for prefix.
But created problems as repairs failed repeatedly.
We might have used it sub optimally, not sure.

Later, we had to do away with it, and tried to serve most of the read patterns 
with materialised views. (currently C*3.0.9)

Currently, for adhoc querries, we use spark or full scan.

Regards,

On Fri, Jan 27, 2017 at 1:03 PM, Yu, John 
> wrote:
Thanks a lot. Mind sharing a couple of points where you feel it’s better than 
the alternatives.

Regards,
John

From: Jonathan Haddad [mailto:j...@jonhaddad.com]
Sent: Thursday, January 26, 2017 2:33 PM
To: user@cassandra.apache.org
Subject: [External] Re: Cassandra ad hoc search options

> With Cassandra, what are the options for ad hoc query/search similar to RDBMS?

Your best options are Spark w/ the DataStax connector or Presto.  Cassandra 
isn't built for ad-hoc queries so you need to use other tools to make it work.

On Thu, Jan 26, 2017 at 2:22 PM Yu, John 
> wrote:
Hi All,

Hope I can get some help here. We’re using Cassandra for services, and recently 
we’re adding UI support.
With Cassandra, what are the options for ad hoc query/search similar to RDBMS? 
We love the features of Cassandra but it seems it’s a known “weakness” that it 
doesn’t come with strong support of indexing and ad hoc queries. There’re some 
recent development with SASI as part of secondary index. However I heard from a 
video where it says it shall not be extensively used.

Has anyone have much experience with SASI? How does it compare to Lucene plugin?
What is the direction of Apache Cassandra in the search area?

We’re also looking into Solr or ElasticSearch integration, but it seems it 
might take more efforts, and possibly involve data duplication.
For Solr, we don’t have DSE.
Sorry if this has been asked before, but I haven’t seen a more complete answer.

Thanks!
John

NOTICE OF CONFIDENTIALITY:
This message may contain information that is considered confidential and which 
may be prohibited from disclosure under applicable law or by contractual 
agreement. The information is intended solely for the use of the individual or 
entity named above. If you are not the intended recipient, you are hereby 
notified that any disclosure, copying, distribution or use of the information 
contained in or attached to this message is strictly 

Re: [External] Re: Cassandra ad hoc search options

2017-01-30 Thread vincent gromakowski
I gave a try on spark+filodb and it's very interesting for ad-hoc queries

Le 31 janv. 2017 7:20 AM, "siddharth verma"  a
écrit :

Hi,
*Are you using the DataStax connector as well? *
Yes, we used it to query on lucene index.

*Does it support querying against any column well (not just clustering
columns)?*
Yes it does. We used lucene particularly for this purpose.
( You can use :
1. https://github.com/Stratio/cassandra-lucene-index/blob/branch-3.0.10/doc/
documentation.rst#searching
2. https://www.youtube.com/watch?v=Hg5s-hXy_-M
for more details)

*I’m wondering how it could build the index around them “on-the-fly”*
You can build indexes at run time, but it takes time(took a lot of time on
our cluster. Plus, CPU utilization went through the roof)

*did you use Spark for the full set of data or just partial*
We weren't allowed to install spark ( tech decision)
Some tech discussions going around for the bulk job ecosystem.

Hence as a work around, we used a faster scan utility.
For all the adhoc purposes/scripts, you could do a full scan.

I hope it helps.

Regards


On Tue, Jan 31, 2017 at 4:11 AM, Yu, John  wrote:

> A follow up question is: did you use Spark for the full set of data or
> just partial? In our case, I feel we need all the data to support ad hoc
> queries (with multiple conditional filters).
>
>
>
> Thanks,
>
> John
>
>
>
> *From:* Yu, John [mailto:john...@sandc.com]
> *Sent:* Monday, January 30, 2017 12:04 AM
> *To:* user@cassandra.apache.org
> *Subject:* RE: [External] Re: Cassandra ad hoc search options
>
>
>
> Thanks for the input! Are you using the DataStax connector as well? Does
> it support querying against any column well (not just clustering columns)?
> I’m wondering how it could build the index around them “on-the-fly”.
>
>
>
> Regards,
>
> John
>
>
>
> *From:* siddharth verma [mailto:sidd.verma29.l...@gmail.com
> ]
> *Sent:* Friday, January 27, 2017 12:15 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: [External] Re: Cassandra ad hoc search options
>
>
>
> Hi
>
> We used lucene stratio plugin with C*3.0.3
>
>
>
> Helped to solve a lot of some read patterns. Served well for prefix.
>
> But created problems as repairs failed repeatedly.
>
> We might have used it sub optimally, not sure.
>
>
>
> Later, we had to do away with it, and tried to serve most of the read
> patterns with materialised views. (currently C*3.0.9)
>
>
>
> Currently, for adhoc querries, we use spark or full scan.
>
>
>
> Regards,
>
>
>
> On Fri, Jan 27, 2017 at 1:03 PM, Yu, John  wrote:
>
> Thanks a lot. Mind sharing a couple of points where you feel it’s better
> than the alternatives.
>
>
>
> Regards,
>
> John
>
>
>
> *From:* Jonathan Haddad [mailto:j...@jonhaddad.com]
> *Sent:* Thursday, January 26, 2017 2:33 PM
> *To:* user@cassandra.apache.org
> *Subject:* [External] Re: Cassandra ad hoc search options
>
>
>
> > With Cassandra, what are the options for ad hoc query/search similar to
> RDBMS?
>
>
>
> Your best options are Spark w/ the DataStax connector or Presto.
> Cassandra isn't built for ad-hoc queries so you need to use other tools to
> make it work.
>
>
>
> On Thu, Jan 26, 2017 at 2:22 PM Yu, John  wrote:
>
> Hi All,
>
>
>
> Hope I can get some help here. We’re using Cassandra for services, and
> recently we’re adding UI support.
>
> With Cassandra, what are the options for ad hoc query/search similar to
> RDBMS? We love the features of Cassandra but it seems it’s a known
> “weakness” that it doesn’t come with strong support of indexing and ad hoc
> queries. There’re some recent development with SASI as part of secondary
> index. However I heard from a video where it says it shall not be
> extensively used.
>
>
>
> Has anyone have much experience with SASI? How does it compare to Lucene
> plugin?
>
> What is the direction of Apache Cassandra in the search area?
>
>
>
> We’re also looking into Solr or ElasticSearch integration, but it seems it
> might take more efforts, and possibly involve data duplication.
>
> For Solr, we don’t have DSE.
>
> Sorry if this has been asked before, but I haven’t seen a more complete
> answer.
>
>
>
> Thanks!
>
> John
> --
>
> NOTICE OF CONFIDENTIALITY:
> This message may contain information that is considered confidential and
> which may be prohibited from disclosure under applicable law or by
> contractual agreement. The information is intended solely for the use of
> the individual or entity named above. If you are not the intended
> recipient, you are hereby notified that any disclosure, copying,
> distribution or use of the information contained in or attached to this
> message is strictly prohibited. If you have received this email
> transmission in error, please notify the sender by replying to this email
> and then delete it from your system.
>
>
>
>
>
> --
>
> Siddharth Verma
>
> (Visit 

Re: [Multi DC] Old Data Not syncing from Existing cluster to new Cluster

2017-01-30 Thread kurt greaves
On 30 January 2017 at 04:43, Abhishek Kumar Maheshwari <
abhishek.maheshw...@timesinternet.in> wrote:

> But how I will tell rebuild command source DC if I have more than 2 Dc?



You will need to rebuild the new DC from at least one DC for every keyspace
present on the new DC and the old DC's.
For example, if you have 2 DC's A, B, and add a new DC "C", with keyspace
"X" replicated to A and C, keyspace "Y" replicated to B and C, you will
need to rebuild the nodes from "C" from both DC's A and B, otherwise they
will not stream a full set of data for both keyspaces.

If all your keyspaces are replicated to all DC's, you only need to rebuild
from one other DC (which one doesn't *really* matter).

Note that if you rebuild multiple times on a node you will end up with
duplicate data. This isn't an issue, compactions will clean it up over
time. Usually if a rebuild fails for any reason you should wipe the data
directory to ensure you don't end up with 2 copies of a lot of the data.


Re: [External] Re: Cassandra ad hoc search options

2017-01-30 Thread siddharth verma
Hi,
*Are you using the DataStax connector as well? *
Yes, we used it to query on lucene index.

*Does it support querying against any column well (not just clustering
columns)?*
Yes it does. We used lucene particularly for this purpose.
( You can use :
1.
https://github.com/Stratio/cassandra-lucene-index/blob/branch-3.0.10/doc/documentation.rst#searching
2. https://www.youtube.com/watch?v=Hg5s-hXy_-M
for more details)

*I’m wondering how it could build the index around them “on-the-fly”*
You can build indexes at run time, but it takes time(took a lot of time on
our cluster. Plus, CPU utilization went through the roof)

*did you use Spark for the full set of data or just partial*
We weren't allowed to install spark ( tech decision)
Some tech discussions going around for the bulk job ecosystem.

Hence as a work around, we used a faster scan utility.
For all the adhoc purposes/scripts, you could do a full scan.

I hope it helps.

Regards


On Tue, Jan 31, 2017 at 4:11 AM, Yu, John  wrote:

> A follow up question is: did you use Spark for the full set of data or
> just partial? In our case, I feel we need all the data to support ad hoc
> queries (with multiple conditional filters).
>
>
>
> Thanks,
>
> John
>
>
>
> *From:* Yu, John [mailto:john...@sandc.com]
> *Sent:* Monday, January 30, 2017 12:04 AM
> *To:* user@cassandra.apache.org
> *Subject:* RE: [External] Re: Cassandra ad hoc search options
>
>
>
> Thanks for the input! Are you using the DataStax connector as well? Does
> it support querying against any column well (not just clustering columns)?
> I’m wondering how it could build the index around them “on-the-fly”.
>
>
>
> Regards,
>
> John
>
>
>
> *From:* siddharth verma [mailto:sidd.verma29.l...@gmail.com
> ]
> *Sent:* Friday, January 27, 2017 12:15 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: [External] Re: Cassandra ad hoc search options
>
>
>
> Hi
>
> We used lucene stratio plugin with C*3.0.3
>
>
>
> Helped to solve a lot of some read patterns. Served well for prefix.
>
> But created problems as repairs failed repeatedly.
>
> We might have used it sub optimally, not sure.
>
>
>
> Later, we had to do away with it, and tried to serve most of the read
> patterns with materialised views. (currently C*3.0.9)
>
>
>
> Currently, for adhoc querries, we use spark or full scan.
>
>
>
> Regards,
>
>
>
> On Fri, Jan 27, 2017 at 1:03 PM, Yu, John  wrote:
>
> Thanks a lot. Mind sharing a couple of points where you feel it’s better
> than the alternatives.
>
>
>
> Regards,
>
> John
>
>
>
> *From:* Jonathan Haddad [mailto:j...@jonhaddad.com]
> *Sent:* Thursday, January 26, 2017 2:33 PM
> *To:* user@cassandra.apache.org
> *Subject:* [External] Re: Cassandra ad hoc search options
>
>
>
> > With Cassandra, what are the options for ad hoc query/search similar to
> RDBMS?
>
>
>
> Your best options are Spark w/ the DataStax connector or Presto.
> Cassandra isn't built for ad-hoc queries so you need to use other tools to
> make it work.
>
>
>
> On Thu, Jan 26, 2017 at 2:22 PM Yu, John  wrote:
>
> Hi All,
>
>
>
> Hope I can get some help here. We’re using Cassandra for services, and
> recently we’re adding UI support.
>
> With Cassandra, what are the options for ad hoc query/search similar to
> RDBMS? We love the features of Cassandra but it seems it’s a known
> “weakness” that it doesn’t come with strong support of indexing and ad hoc
> queries. There’re some recent development with SASI as part of secondary
> index. However I heard from a video where it says it shall not be
> extensively used.
>
>
>
> Has anyone have much experience with SASI? How does it compare to Lucene
> plugin?
>
> What is the direction of Apache Cassandra in the search area?
>
>
>
> We’re also looking into Solr or ElasticSearch integration, but it seems it
> might take more efforts, and possibly involve data duplication.
>
> For Solr, we don’t have DSE.
>
> Sorry if this has been asked before, but I haven’t seen a more complete
> answer.
>
>
>
> Thanks!
>
> John
> --
>
> NOTICE OF CONFIDENTIALITY:
> This message may contain information that is considered confidential and
> which may be prohibited from disclosure under applicable law or by
> contractual agreement. The information is intended solely for the use of
> the individual or entity named above. If you are not the intended
> recipient, you are hereby notified that any disclosure, copying,
> distribution or use of the information contained in or attached to this
> message is strictly prohibited. If you have received this email
> transmission in error, please notify the sender by replying to this email
> and then delete it from your system.
>
>
>
>
>
> --
>
> Siddharth Verma
>
> (Visit https://github.com/siddv29/cfs for a high speed cassandra full
> table scan)
>



-- 
Siddharth Verma
(Visit https://github.com/siddv29/cfs for a high speed cassandra full table
scan)


RE: [External] Re: Cassandra ad hoc search options

2017-01-30 Thread Yu, John
A follow up question is: did you use Spark for the full set of data or just 
partial? In our case, I feel we need all the data to support ad hoc queries 
(with multiple conditional filters).

Thanks,
John

From: Yu, John [mailto:john...@sandc.com]
Sent: Monday, January 30, 2017 12:04 AM
To: user@cassandra.apache.org
Subject: RE: [External] Re: Cassandra ad hoc search options

Thanks for the input! Are you using the DataStax connector as well? Does it 
support querying against any column well (not just clustering columns)? I’m 
wondering how it could build the index around them “on-the-fly”.

Regards,
John

From: siddharth verma [mailto:sidd.verma29.l...@gmail.com]
Sent: Friday, January 27, 2017 12:15 AM
To: user@cassandra.apache.org
Subject: Re: [External] Re: Cassandra ad hoc search options

Hi
We used lucene stratio plugin with C*3.0.3

Helped to solve a lot of some read patterns. Served well for prefix.
But created problems as repairs failed repeatedly.
We might have used it sub optimally, not sure.

Later, we had to do away with it, and tried to serve most of the read patterns 
with materialised views. (currently C*3.0.9)

Currently, for adhoc querries, we use spark or full scan.

Regards,

On Fri, Jan 27, 2017 at 1:03 PM, Yu, John 
> wrote:
Thanks a lot. Mind sharing a couple of points where you feel it’s better than 
the alternatives.

Regards,
John

From: Jonathan Haddad [mailto:j...@jonhaddad.com]
Sent: Thursday, January 26, 2017 2:33 PM
To: user@cassandra.apache.org
Subject: [External] Re: Cassandra ad hoc search options

> With Cassandra, what are the options for ad hoc query/search similar to RDBMS?

Your best options are Spark w/ the DataStax connector or Presto.  Cassandra 
isn't built for ad-hoc queries so you need to use other tools to make it work.

On Thu, Jan 26, 2017 at 2:22 PM Yu, John 
> wrote:
Hi All,

Hope I can get some help here. We’re using Cassandra for services, and recently 
we’re adding UI support.
With Cassandra, what are the options for ad hoc query/search similar to RDBMS? 
We love the features of Cassandra but it seems it’s a known “weakness” that it 
doesn’t come with strong support of indexing and ad hoc queries. There’re some 
recent development with SASI as part of secondary index. However I heard from a 
video where it says it shall not be extensively used.

Has anyone have much experience with SASI? How does it compare to Lucene plugin?
What is the direction of Apache Cassandra in the search area?

We’re also looking into Solr or ElasticSearch integration, but it seems it 
might take more efforts, and possibly involve data duplication.
For Solr, we don’t have DSE.
Sorry if this has been asked before, but I haven’t seen a more complete answer.

Thanks!
John

NOTICE OF CONFIDENTIALITY:
This message may contain information that is considered confidential and which 
may be prohibited from disclosure under applicable law or by contractual 
agreement. The information is intended solely for the use of the individual or 
entity named above. If you are not the intended recipient, you are hereby 
notified that any disclosure, copying, distribution or use of the information 
contained in or attached to this message is strictly prohibited. If you have 
received this email transmission in error, please notify the sender by replying 
to this email and then delete it from your system.



--
Siddharth Verma
(Visit https://github.com/siddv29/cfs for a high speed cassandra full table 
scan)


nodetool repair of large partition

2017-01-30 Thread Jimmy Lin
hi,
if i have a row in a table that contain large data (not necessary super
wide row), say 10 G and a replication factor of 3.

During a repair, if the data of the row in each of the node is simply off
by 1 byte, is cassandra smart enough to stream only partial of the data
(maybe based on a range of the clustering key) ?
or does it have to stream all 10G of data from other 2 nodes and compare
and consolidate?

If it is the second case, besides bandwith usage spike, is there any other
negative impact on cassandra node?

thanks


Re: No Host AvailableException during querying Cassandra.

2017-01-30 Thread Sikander Rafiq
Hi Admin,

Please remove my email from the list. thanks.

Sikander


Sent from Outlook



From: Venkata D 
Sent: Friday, January 27, 2017 10:01 PM
To: user@cassandra.apache.org
Subject: No Host AvailableException during querying Cassandra.

Hello All,

We are using DSE 4.6.6 & Cassandra 2.0.14.425.

I am facing this exception right now. We got this exception couple of times & 
repair jobs helped us temporarily.

As the data is growing significantly we are experiencing this exception more 
than couple of times. Does any one have any thoughts on this ?


Exception in thread "main" org.apache.spark.SparkException: Job aborted due to 
stage failure: Task 114 in stage 17.0 failed 4 times, most recent failure: Lost 
task 114.3 in stage 17.0 (TID 196, ): 
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried 
for query failed (tried: [All IP addresses] - use getErrors() for details)

com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)

com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:259)

com.datastax.driver.core.ArrayBackedResultSet$MultiPage.prepareNextRow(ArrayBackedResultSet.java:279)

com.datastax.driver.core.ArrayBackedResultSet$MultiPage.isExhausted(ArrayBackedResultSet.java:239)

com.datastax.driver.core.ArrayBackedResultSet$1.hasNext(ArrayBackedResultSet.java:122)

com.datastax.spark.connector.rdd.reader.PrefetchingResultSetIterator.hasNext(PrefetchingResultSetIterator.scala:16)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)

com.datastax.spark.connector.util.CountingIterator.hasNext(CountingIterator.scala:10)
scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:235)
org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163)
org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)


Thanks,
Venkat.


Re: [Multi DC] Old Data Not syncing from Existing cluster to new Cluster

2017-01-30 Thread Bhuvan Rawal
Hi Abhishek,

nodetool status output can be misleading at times.
In order to ensure data is in sync, schedule a repair for the imapcted
keyspaces.

Regards,

On Mon, Jan 30, 2017 at 10:13 AM, Abhishek Kumar Maheshwari <
abhishek.maheshw...@timesinternet.in> wrote:

> But how I will tell rebuild command source DC if I have more than 2 Dc?
>
>
>
> @dinking, yes I run the command, and it did some strange thing now:
>
>
>
> Datacenter: DRPOCcluster
>
> 
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  AddressLoad   Tokens   OwnsHost
> ID   Rack
>
> UN  172.29.XX.XXX  140.16 GB  256  ?   
> badf985b-37da-4735-b468-8d3a058d4b60
> 01
>
> UN  172.29. XX.XXX  82.04 GB   256  ?
> 317061b2-c19f-44ba-a776-bcd91c70bbdd  03
>
> UN  172.29. XX.XXX  85.29 GB   256  ?
> 9bf0d1dc-6826-4f3b-9c56-cec0c9ce3b6c  02
>
> Datacenter: dc_india
>
> 
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  AddressLoad   Tokens   OwnsHost
> ID   Rack
>
> UN  172.26. XX.XXX   79.09 GB   256  ?
> 3e8133ed-98b5-418d-96b5-690a1450cd30  RACK1
>
> UN  172.26. XX.XXX   79.39 GB   256  ?
> 7d3f5b25-88f9-4be7-b0f5-746619153543  RACK2
>
>
>
>
>
>
>
> In source DC (dc_india) we have near about 79 GB data. But in new DC each
> node has more than 79 GB data and Seed IP have near about 2 times data.
> Below is replication:
>
> Data Key Space:
>
> alter KEYSPACE wls WITH replication = {'class': 'NetworkTopologyStrategy',
> 'DRPOCcluster': '3','dc_india':'2'}  AND durable_writes = true;
>
> alter KEYSPACE adlog WITH replication = {'class':
> 'NetworkTopologyStrategy', 'DRPOCcluster': '3','dc_india':'2'}  AND
> durable_writes = true;
>
>
>
> New DC('DRPOCcluster') system Key Space:
>
>
>
> alter KEYSPACE system_distributed WITH replication = {'class':
> 'NetworkTopologyStrategy', 'DRPOCcluster': '3','dc_india':'0'}  AND
> durable_writes = true;
>
> alter KEYSPACE system_auth WITH replication = {'class':
> 'NetworkTopologyStrategy', 'DRPOCcluster': '3','dc_india':'0'}  AND
> durable_writes = true;
>
> alter KEYSPACE system_traces WITH replication = {'class':
> 'NetworkTopologyStrategy', 'DRPOCcluster': '3','dc_india':'0'}  AND
> durable_writes = true;
>
> alter KEYSPACE "OpsCenter" WITH replication = {'class':
> 'NetworkTopologyStrategy', 'DRPOCcluster': '3','dc_india':'0'}  AND
> durable_writes = true;
>
>
>
> Old  DC(‘dc_india’) system Key Space:
>
>
>
> alter KEYSPACE system_distributed WITH replication = {'class':
> 'NetworkTopologyStrategy', 'DRPOCcluster': '0','dc_india':'2'}  AND
> durable_writes = true;
>
> alter KEYSPACE system_auth WITH replication = {'class':
> 'NetworkTopologyStrategy', 'DRPOCcluster': '0','dc_india':'2'}  AND
> durable_writes = true;
>
> alter KEYSPACE system_traces WITH replication = {'class':
> 'NetworkTopologyStrategy', 'DRPOCcluster': '0','dc_india':'2'}  AND
> durable_writes = true;
>
> alter KEYSPACE "OpsCenter" WITH replication = {'class':
> 'NetworkTopologyStrategy', 'DRPOCcluster': '0','dc_india':'2'}  AND
> durable_writes = true;
>
>
>
> why this happening? I did soething wrong?
>
>
>
> *Thanks & Regards,*
> *Abhishek Kumar Maheshwari*
> *+91- 805591 (Mobile)*
>
> Times Internet Ltd. | A Times of India Group Company
>
> FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
>
> *P** Please do not print this email unless it is absolutely necessary.
> Spread environmental awareness.*
>
>
>
> *From:* kurt greaves [mailto:k...@instaclustr.com]
> *Sent:* Saturday, January 28, 2017 3:27 AM
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: [Multi DC] Old Data Not syncing from Existing cluster to
> new Cluster
>
>
>
> What Dikang said, in your original email you are passing -dc to rebuild.
> This is incorrect. Simply run nodetool rebuild  from each of the
> nodes in the new dc.
>
>
>
> On 28 Jan 2017 07:50, "Dikang Gu"  wrote:
>
> Have you run 'nodetool rebuild dc_india' on the new nodes?
>
>
>
> On Tue, Jan 24, 2017 at 7:51 AM, Benjamin Roth 
> wrote:
>
> Have you also altered RF of system_distributed as stated in the tutorial?
>
>
>
> 2017-01-24 16:45 GMT+01:00 Abhishek Kumar Maheshwari  timesinternet.in>:
>
> My Mistake,
>
>
>
> Both clusters are up and running.
>
>
>
> Datacenter: DRPOCcluster
>
> 
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  AddressLoad   Tokens   OwnsHost
> ID   Rack
>
> UN  172.29.XX.XX  1.65 GB   256  ?   
> badf985b-37da-4735-b468-8d3a058d4b60
> 01
>
> UN  172.29.XX.XX  1.64 GB   256  ?   
> 317061b2-c19f-44ba-a776-bcd91c70bbdd
> 03
>
> UN  172.29.XX.XX  1.64 GB   256  ?   
> 9bf0d1dc-6826-4f3b-9c56-cec0c9ce3b6c
> 02
>
> Datacenter: dc_india
>
> 
>
> 

RE: [External] Re: Cassandra ad hoc search options

2017-01-30 Thread Yu, John
Thanks for the input! Are you using the DataStax connector as well? Does it 
support querying against any column well (not just clustering columns)? I’m 
wondering how it could build the index around them “on-the-fly”.

Regards,
John

From: siddharth verma [mailto:sidd.verma29.l...@gmail.com]
Sent: Friday, January 27, 2017 12:15 AM
To: user@cassandra.apache.org
Subject: Re: [External] Re: Cassandra ad hoc search options

Hi
We used lucene stratio plugin with C*3.0.3

Helped to solve a lot of some read patterns. Served well for prefix.
But created problems as repairs failed repeatedly.
We might have used it sub optimally, not sure.

Later, we had to do away with it, and tried to serve most of the read patterns 
with materialised views. (currently C*3.0.9)

Currently, for adhoc querries, we use spark or full scan.

Regards,

On Fri, Jan 27, 2017 at 1:03 PM, Yu, John 
> wrote:
Thanks a lot. Mind sharing a couple of points where you feel it’s better than 
the alternatives.

Regards,
John

From: Jonathan Haddad [mailto:j...@jonhaddad.com]
Sent: Thursday, January 26, 2017 2:33 PM
To: user@cassandra.apache.org
Subject: [External] Re: Cassandra ad hoc search options

> With Cassandra, what are the options for ad hoc query/search similar to RDBMS?

Your best options are Spark w/ the DataStax connector or Presto.  Cassandra 
isn't built for ad-hoc queries so you need to use other tools to make it work.

On Thu, Jan 26, 2017 at 2:22 PM Yu, John 
> wrote:
Hi All,

Hope I can get some help here. We’re using Cassandra for services, and recently 
we’re adding UI support.
With Cassandra, what are the options for ad hoc query/search similar to RDBMS? 
We love the features of Cassandra but it seems it’s a known “weakness” that it 
doesn’t come with strong support of indexing and ad hoc queries. There’re some 
recent development with SASI as part of secondary index. However I heard from a 
video where it says it shall not be extensively used.

Has anyone have much experience with SASI? How does it compare to Lucene plugin?
What is the direction of Apache Cassandra in the search area?

We’re also looking into Solr or ElasticSearch integration, but it seems it 
might take more efforts, and possibly involve data duplication.
For Solr, we don’t have DSE.
Sorry if this has been asked before, but I haven’t seen a more complete answer.

Thanks!
John

NOTICE OF CONFIDENTIALITY:
This message may contain information that is considered confidential and which 
may be prohibited from disclosure under applicable law or by contractual 
agreement. The information is intended solely for the use of the individual or 
entity named above. If you are not the intended recipient, you are hereby 
notified that any disclosure, copying, distribution or use of the information 
contained in or attached to this message is strictly prohibited. If you have 
received this email transmission in error, please notify the sender by replying 
to this email and then delete it from your system.



--
Siddharth Verma
(Visit https://github.com/siddv29/cfs for a high speed cassandra full table 
scan)