Re: Decommissioned nodes show as DOWN in Cassandra version 3.10

2017-06-12 Thread Vladimir Yudovin
Hi,

you can use 

http://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsRemoveNode.html



or if this doesn't work ("It is a last resort tool if you cannot successfully 
use nodetool removenode.")

http://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsAssassinate.html



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Mon, 12 Jun 2017 15:15:33 -0400 pabbireddy avinash 
pabbireddyavin...@gmail.com wrote 




Hi

In the Cassandra version 3.10, after we decommission a node or datacenter, we 
observe the decommissioned nodes marked as DOWN in the cluster when you do a 
"nodetool describecluster". The nodes however do not show up in the "nodetool 
status" command.
The decommissioned node also does not show up in the "system_peers" table on 
the nodes.
The workaround we follow is rolling restart of the cluster, which removes the 
decommissioned nodes from the "UNREACHABLE STATE", and shows the actual state 
of the cluster. The workaround is tedious for huge clusters.



as anybody in the community observed similar issue?

Below are the observed logs

2017-06-12 18:23:29,209 [RMI TCP Connection(8)-127.0.0.1] INFO 
StorageService.java:3938 - Announcing that I have left the ring for 3ms
 2017-06-12 18:23:59,210 [RMI TCP Connection(8)-127.0.0.1] INFO 
ThriftServer.java:139 - Stop listening to thrift clients
 2017-06-12 18:23:59,215 [RMI TCP Connection(8)-127.0.0.1] INFO Server.java:176 
- Stop listening for CQL clients
 2017-06-12 18:23:59,216 [RMI TCP Connection(8)-127.0.0.1] WARN 
Gossiper.java:1514 - No local state, state is in silent shutdown, or node 
hasn't joined, not announcing shutdown
 2017-06-12 18:23:59,216 [RMI TCP Connection(8)-127.0.0.1] INFO 
MessagingService.java:964 - Waiting for messaging service to quiesce
 2017-06-12 18:23:59,217 [ACCEPT-/96.115.209.228] INFO 
MessagingService.java:1314 - MessagingService has terminated the accept() thread
 2017-06-12 18:23:59,263 [RMI TCP Connection(8)-127.0.0.1] INFO 
StorageService.java:1435 - DECOMMISSIONED




Regards,
Avinash.








Re: Reg:- Cassandra Data modelling for Search

2017-06-12 Thread @Nandan@
Yes I am not thinking to go with MV. I am trying to implement by myself.
May be some idea will get about doing cassandra-stress about data
generation and all.
Thanks Jonathan.

On Tue, Jun 13, 2017 at 10:44 AM, Jonathan Haddad  wrote:

> Unless you're willing to put in a lot of time fixing bugs, I'd recommend
> avoiding 3.0's materialized views and manage them yourself.
>
> On Mon, Jun 12, 2017 at 6:11 PM @Nandan@ 
> wrote:
>
>> Correct, Our first concern is to store huge READ and WRITE, for that
>> Cassandra is our First and Best Choice. But according to Use Case, we need
>> to implement Advance search like Partial text, Phrase search etc.. So we
>> are thinking the best way, that how to implement data model.
>>
>>
>> On Tue, Jun 13, 2017 at 3:35 AM, Oskar Kjellin 
>> wrote:
>>
>>> Agree, I meant as Jonathan said to use C* for primary key and as a
>>> primary storage and ES as an indexed version of what you have in cassandra.
>>>
>>> 2017-06-12 19:19 GMT+02:00 DuyHai Doan :
>>>
 Sorry, I misread some reply I had the impression that people recommend
 ES as primary datastore

 On Mon, Jun 12, 2017 at 7:12 PM, Jonathan Haddad 
 wrote:

> Nobody is promoting ES as a primary datastore in this thread.  Every
> mention of it is to accompany C*.
>
>
>
> On Mon, Jun 12, 2017 at 10:03 AM DuyHai Doan 
> wrote:
>
>> For all those promoting ES as a PRIMARY datastore, please read this
>> before:
>>
>> https://discuss.elastic.co/t/elasticsearch-as-a-primary-
>> database/85733/13
>>
>> There are a lot of warning before recommending ES as a datastore.
>>
>> The answer from Pilato, ES official evangelist:
>>
>>
>>- You absolutely care about your data and you want to be able to
>>reindex in all cases. You need for that a datastore. A datastore can 
>> be a
>>filesystem where you store JSON, HDFS, and/or a database you prefer 
>> and you
>>are confident with. About how to inject data in it, you may want to 
>> read:
>>http://david.pilato.fr/blog/2015/05/09/advanced-
>>search-for-your-legacy-application/7
>>
>> 
>>.
>>
>>
>>
>>
>> On Mon, Jun 12, 2017 at 5:08 PM, Michael Mior 
>> wrote:
>>
>>> For queries 1-5 this seems like a potentially good use case for
>>> materialized views. Create one table with the videos stored by ID and 
>>> the
>>> materialized views for each of the queries.
>>>
>>> --
>>> Michael Mior
>>> mm...@apache.org
>>>
>>>
>>> 2017-06-11 22:40 GMT-04:00 @Nandan@ 
>>> :
>>>
 Hi,

 Currently, I am working on data modeling for Video Company in which
 we have different types of users as well as different user 
 functionality.
 But currently, my concern is about Search video module based on
 different fields.

 Query patterns are as below:-
 1) Select video by actor.
 2) select video by producer.
 3) select video by music.
 4) select video by actor and producer.
 5) select video by actor and music.

 Note: - In short, We want to establish an advanced search module by
 which we can search by anyway and get the desired results.

 During a search , we need partial search also such that if any user
 can search "Harry" title, then we are able to give them result as all
 videos whose
  title contains "Harry" at any location.

 As per my ideas, I have to create separate tables such as
 video_by_actor, video_by_producer etc.. and implement solr query on all
 tables. Otherwise,
 is there any others way by which we can implement this search
 module effectively.

 Please suggest.

 Best regards,

>>>
>>>
>>

>>>
>>


Re: Reg:- Cassandra Data modelling for Search

2017-06-12 Thread Jonathan Haddad
Unless you're willing to put in a lot of time fixing bugs, I'd recommend
avoiding 3.0's materialized views and manage them yourself.

On Mon, Jun 12, 2017 at 6:11 PM @Nandan@ 
wrote:

> Correct, Our first concern is to store huge READ and WRITE, for that
> Cassandra is our First and Best Choice. But according to Use Case, we need
> to implement Advance search like Partial text, Phrase search etc.. So we
> are thinking the best way, that how to implement data model.
>
>
> On Tue, Jun 13, 2017 at 3:35 AM, Oskar Kjellin 
> wrote:
>
>> Agree, I meant as Jonathan said to use C* for primary key and as a
>> primary storage and ES as an indexed version of what you have in cassandra.
>>
>> 2017-06-12 19:19 GMT+02:00 DuyHai Doan :
>>
>>> Sorry, I misread some reply I had the impression that people recommend
>>> ES as primary datastore
>>>
>>> On Mon, Jun 12, 2017 at 7:12 PM, Jonathan Haddad 
>>> wrote:
>>>
 Nobody is promoting ES as a primary datastore in this thread.  Every
 mention of it is to accompany C*.



 On Mon, Jun 12, 2017 at 10:03 AM DuyHai Doan 
 wrote:

> For all those promoting ES as a PRIMARY datastore, please read this
> before:
>
>
> https://discuss.elastic.co/t/elasticsearch-as-a-primary-database/85733/13
>
> There are a lot of warning before recommending ES as a datastore.
>
> The answer from Pilato, ES official evangelist:
>
>
>- You absolutely care about your data and you want to be able to
>reindex in all cases. You need for that a datastore. A datastore can 
> be a
>filesystem where you store JSON, HDFS, and/or a database you prefer 
> and you
>are confident with. About how to inject data in it, you may want to 
> read:
>
> http://david.pilato.fr/blog/2015/05/09/advanced-search-for-your-legacy-application/
>7
>
> 
>.
>
>
>
>
> On Mon, Jun 12, 2017 at 5:08 PM, Michael Mior 
> wrote:
>
>> For queries 1-5 this seems like a potentially good use case for
>> materialized views. Create one table with the videos stored by ID and the
>> materialized views for each of the queries.
>>
>> --
>> Michael Mior
>> mm...@apache.org
>>
>>
>> 2017-06-11 22:40 GMT-04:00 @Nandan@ :
>>
>>> Hi,
>>>
>>> Currently, I am working on data modeling for Video Company in which
>>> we have different types of users as well as different user 
>>> functionality.
>>> But currently, my concern is about Search video module based on
>>> different fields.
>>>
>>> Query patterns are as below:-
>>> 1) Select video by actor.
>>> 2) select video by producer.
>>> 3) select video by music.
>>> 4) select video by actor and producer.
>>> 5) select video by actor and music.
>>>
>>> Note: - In short, We want to establish an advanced search module by
>>> which we can search by anyway and get the desired results.
>>>
>>> During a search , we need partial search also such that if any user
>>> can search "Harry" title, then we are able to give them result as all
>>> videos whose
>>>  title contains "Harry" at any location.
>>>
>>> As per my ideas, I have to create separate tables such as
>>> video_by_actor, video_by_producer etc.. and implement solr query on all
>>> tables. Otherwise,
>>> is there any others way by which we can implement this search module
>>> effectively.
>>>
>>> Please suggest.
>>>
>>> Best regards,
>>>
>>
>>
>
>>>
>>
>


Re: Bottleneck for small inserts?

2017-06-12 Thread Eric Pederson
Hi all - I wanted to follow up on this.  I'm happy with the throughput
we're getting but I'm still curious about the bottleneck.

The big thing that sticks out is one of the nodes is logging frequent
GCInspector messages: 350-500ms every 3-6 seconds.  All three nodes in the
cluster have identical Cassandra configuration, but the node that is
logging frequent GCs is an older machine with slower CPU and SSD.  This
node logs frequent GCInspectors both under load and when compacting but
otherwise unloaded.

My theory is that the other two nodes have similar GC frequency (because
they are seeing the same basic load), but because they are faster machines,
they don't spend as much time per GC and don't cross the GCInspector
threshold.  Does that sound plausible?   nodetool tpstats doesn't show any
queueing in the system.

Here's flamegraphs from the system when running a cqlsh COPY FROM:

   - http://sourcedelica.com/wordpress/wp-content/uploads/
   2017/05/flamegraph_ultva01_cars_batch2.svg
   

   - http://sourcedelica.com/wordpress/wp-content/uploads/
   2017/05/flamegraph_ultva02_cars_batch2.svg
   

   - http://sourcedelica.com/wordpress/wp-content/uploads/
   2017/05/flamegraph_ultva03_cars_batch2.svg
   


The slow node (ultva03) spends disproportional time in GC.

Thanks,


-- Eric

On Thu, May 25, 2017 at 8:09 PM, Eric Pederson  wrote:

> Due to a cut and paste error those flamegraphs were a recording of the
> whole system, not just Cassandra.Throughput is approximately 30k
> rows/sec.
>
> Here's the graphs with just the Cassandra PID:
>
>- http://sourcedelica.com/wordpress/wp-content/uploads/
>2017/05/flamegraph_ultva01_sars2.svg
>
> 
>- http://sourcedelica.com/wordpress/wp-content/uploads/
>2017/05/flamegraph_ultva02_sars2.svg
>
> 
>- http://sourcedelica.com/wordpress/wp-content/uploads/
>2017/05/flamegraph_ultva03_sars2.svg
>
> 
>
>
> And here's graphs during a cqlsh COPY FROM to the same table, using real
> data, MAXBATCHSIZE=2.Throughput is good at approximately 110k
> rows/sec.
>
>- http://sourcedelica.com/wordpress/wp-content/uploads/
>2017/05/flamegraph_ultva01_cars_batch2.svg
>
> 
>- http://sourcedelica.com/wordpress/wp-content/uploads/
>2017/05/flamegraph_ultva02_cars_batch2.svg
>
> 
>- http://sourcedelica.com/wordpress/wp-content/uploads/
>2017/05/flamegraph_ultva03_cars_batch2.svg
>
> 
>
>
>
>
> -- Eric
>
> On Thu, May 25, 2017 at 6:44 PM, Eric Pederson  wrote:
>
>> Totally understood :)
>>
>> I forgot to mention - I set the /proc/irq/*/smp_affinity mask to include
>> all of the CPUs.  Actually most of them were set that way already (for
>> example, ,) - it might be because irqbalanced is
>> running.  But for some reason the interrupts are all being handled on CPU 0
>> anyway.
>>
>> I see this in /var/log/dmesg on the machines:
>>
>>>
>>> Your BIOS has requested that x2apic be disabled.
>>> This will leave your machine vulnerable to irq-injection attacks.
>>> Use 'intremap=no_x2apic_optout' to override BIOS request.
>>> Enabled IRQ remapping in xapic mode
>>> x2apic not enabled, IRQ remapping is in xapic mode
>>
>>
>> In a reply to one of the comments, he says:
>>
>>
>> When IO-APIC configured to spread interrupts among all cores, it can
>>> handle up to eight cores. If you have more than eight cores, kernel will
>>> not configure IO-APIC to spread interrupts. Thus the trick I described in
>>> the article will not work.
>>> Otherwise it may be caused by buggy BIOS or even buggy hardware.
>>
>>
>> I'm not sure if either of them is relevant to my situation.
>>
>>
>> Thanks!
>>
>>
>>
>>
>>
>> -- Eric
>>
>> On Thu, May 25, 2017 at 4:16 PM, Jonathan Haddad 
>> wrote:
>>
>>> You shouldn't need a kernel recompile.  Check out the section "Simple
>>> solution for the problem" in http://www.alexonlinux.com/
>>> smp-affinity-and-proper-interrupt-handling-in-linux.  You can balance
>>> your requests across up to 8 CPUs.
>>>
>>> I'll check out the flame graphs in a little bit - in the middle of
>>> something and my 

Re: Reg:- Cassandra Data modelling for Search

2017-06-12 Thread @Nandan@
Correct, Our first concern is to store huge READ and WRITE, for that
Cassandra is our First and Best Choice. But according to Use Case, we need
to implement Advance search like Partial text, Phrase search etc.. So we
are thinking the best way, that how to implement data model.


On Tue, Jun 13, 2017 at 3:35 AM, Oskar Kjellin 
wrote:

> Agree, I meant as Jonathan said to use C* for primary key and as a primary
> storage and ES as an indexed version of what you have in cassandra.
>
> 2017-06-12 19:19 GMT+02:00 DuyHai Doan :
>
>> Sorry, I misread some reply I had the impression that people recommend ES
>> as primary datastore
>>
>> On Mon, Jun 12, 2017 at 7:12 PM, Jonathan Haddad 
>> wrote:
>>
>>> Nobody is promoting ES as a primary datastore in this thread.  Every
>>> mention of it is to accompany C*.
>>>
>>>
>>>
>>> On Mon, Jun 12, 2017 at 10:03 AM DuyHai Doan 
>>> wrote:
>>>
 For all those promoting ES as a PRIMARY datastore, please read this
 before:

 https://discuss.elastic.co/t/elasticsearch-as-a-primary-data
 base/85733/13

 There are a lot of warning before recommending ES as a datastore.

 The answer from Pilato, ES official evangelist:


- You absolutely care about your data and you want to be able to
reindex in all cases. You need for that a datastore. A datastore can be 
 a
filesystem where you store JSON, HDFS, and/or a database you prefer and 
 you
are confident with. About how to inject data in it, you may want to 
 read:
http://david.pilato.fr/blog/2015/05/09/advanced-search
-for-your-legacy-application/7

 
.




 On Mon, Jun 12, 2017 at 5:08 PM, Michael Mior 
 wrote:

> For queries 1-5 this seems like a potentially good use case for
> materialized views. Create one table with the videos stored by ID and the
> materialized views for each of the queries.
>
> --
> Michael Mior
> mm...@apache.org
>
>
> 2017-06-11 22:40 GMT-04:00 @Nandan@ :
>
>> Hi,
>>
>> Currently, I am working on data modeling for Video Company in which
>> we have different types of users as well as different user functionality.
>> But currently, my concern is about Search video module based on
>> different fields.
>>
>> Query patterns are as below:-
>> 1) Select video by actor.
>> 2) select video by producer.
>> 3) select video by music.
>> 4) select video by actor and producer.
>> 5) select video by actor and music.
>>
>> Note: - In short, We want to establish an advanced search module by
>> which we can search by anyway and get the desired results.
>>
>> During a search , we need partial search also such that if any user
>> can search "Harry" title, then we are able to give them result as all
>> videos whose
>>  title contains "Harry" at any location.
>>
>> As per my ideas, I have to create separate tables such as
>> video_by_actor, video_by_producer etc.. and implement solr query on all
>> tables. Otherwise,
>> is there any others way by which we can implement this search module
>> effectively.
>>
>> Please suggest.
>>
>> Best regards,
>>
>
>

>>
>


Re: Reg:- Cassandra Data modelling for Search

2017-06-12 Thread @Nandan@
Hi Michael ,
MV is also good option when we have to select based on equality search, but
here condition is to developing a model for advance partial search way.
And Also , In case of MV, suppose we have 2 DC with 3 Nodes on each DC then
MV will replicated data based on 6*6 times which will be another problem.


On Mon, Jun 12, 2017 at 11:08 PM, Michael Mior  wrote:

> For queries 1-5 this seems like a potentially good use case for
> materialized views. Create one table with the videos stored by ID and the
> materialized views for each of the queries.
>
> --
> Michael Mior
> mm...@apache.org
>
>
> 2017-06-11 22:40 GMT-04:00 @Nandan@ :
>
>> Hi,
>>
>> Currently, I am working on data modeling for Video Company in which we
>> have different types of users as well as different user functionality.
>> But currently, my concern is about Search video module based on different
>> fields.
>>
>> Query patterns are as below:-
>> 1) Select video by actor.
>> 2) select video by producer.
>> 3) select video by music.
>> 4) select video by actor and producer.
>> 5) select video by actor and music.
>>
>> Note: - In short, We want to establish an advanced search module by which
>> we can search by anyway and get the desired results.
>>
>> During a search , we need partial search also such that if any user can
>> search "Harry" title, then we are able to give them result as all videos
>> whose
>>  title contains "Harry" at any location.
>>
>> As per my ideas, I have to create separate tables such as video_by_actor,
>> video_by_producer etc.. and implement solr query on all tables. Otherwise,
>> is there any others way by which we can implement this search module
>> effectively.
>>
>> Please suggest.
>>
>> Best regards,
>>
>
>


Re: Reg:- Cassandra Data modelling for Search

2017-06-12 Thread @Nandan@
Ok , Then let's try to implement and will check by using cassandra-stress
to check what will be performance.
I worked on another data model for book storage for my company, with same
situations like having 1 single table with 80 columns and primary key as
bookid uuid.  Implemented Solr on top of that.  That's why , I am try to
implement all possible best solution for upcoming projects.


On Mon, Jun 12, 2017 at 7:51 PM, Eduardo Alonso 
wrote:

> -Virtual tokens are not recommended when using SOLR or
> cassandra-lucene-index.
>
> If you use your table schema you will not have any problem with partition
> size because your table is *not* a WIDE row table (it does not have
> clustering keys)
> The limit for 1 record with those 15 or 20 columns must not be larger that
> 100MB. You will have enough.
>
> Eduardo Alonso
> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com // 
> *@stratiobd
> *
>
> 2017-06-12 12:36 GMT+02:00 @Nandan@ :
>
>> And due to single table videos, maybe it will go with around 15,20
>> columns, then we need to also think very carefully about partition sizes
>> also.
>>
>> On Mon, Jun 12, 2017 at 6:33 PM, @Nandan@ > > wrote:
>>
>>> Yes this is only Option I am also thinking like this as my second
>>> options. Before this I was thinking to do denormalize table based on search
>>> columns, but due to partial search this will be not that effective.
>>>
>>> Now suppose , if we are going with this single table as videos. and
>>> implemented with Solr/Lucene, then need to also care about num_tokens ?
>>>
>>>
>>> On Mon, Jun 12, 2017 at 6:27 PM, Eduardo Alonso <
>>> eduardoalo...@stratio.com> wrote:
>>>
 Using cassandra collections

 CREATE TABLE videos (
 videoid uuid primary key,
 title text,
 actor list,
 producer list,
 release_date timestamp,
 description text,
 music text,
 etc...
 );

 When using collection you need to take care of its length. Collections
 are designed to store
 only
 a small amount of data
 
 .
 5/10 actors per movie is ok.


 Eduardo Alonso
 Vía de las dos Castillas, 33, Ática 4, 3ª Planta
 28224 Pozuelo de Alarcón, Madrid
 Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com // 
 *@stratiobd
 *

 2017-06-12 11:54 GMT+02:00 @Nandan@ :

> So In short we have to go with one single table as videos and put
> primary key as videoid uuid.
> But then how can we able to handle multiple actor name and producer
> name. ?
>
> On Mon, Jun 12, 2017 at 5:51 PM, Eduardo Alonso <
> eduardoalo...@stratio.com> wrote:
>
>> Yes, you are right.
>>
>> Table denormalization is useful just when you have unique primary
>> keys, not your case.
>> Denormalized tables are only different in its primary key, every
>> denormalized table contains all the data (it just change how it is
>> structured). So, if you need to index it, do it with just one table (the
>> one you showed us with videoid as the primary key is ok).
>>
>> Solr, Elastic and cassandra-lucene-index are both based on Lucene and
>> all of them fulfill all your needs.
>>
>> Solr (in DSE) and cassandra-lucene-index
>>  are very well
>> integrated with cassandra using its secondary index interface. If you
>> choose elastic search you will need to code the integration (write mutex,
>> both cluster synchronization (imagine something written in cassandra but
>> failed to write in elastic))
>>
>> I know i am not the most suitable to recommend you to use our product
>> cassandra-lucene-index
>>  but it is open
>> source, just take a look.
>>
>> Eduardo Alonso
>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>> 28224 Pozuelo de Alarcón, Madrid
>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com
>>  // *@stratiobd *
>>
>> 2017-06-12 11:18 GMT+02:00 @Nandan@ :
>>
>>> Hi Eduardo,
>>>
>>> And As we are trying to build an advanced search functionality in
>>> which we can able to do partial search based on actor, producer, 
>>> director,
>>> etc. columns.
>>> So if we do denormalization of tables then we have to create tables
>>> such as below :-
>>> video_by_actor
>>> video_by_producer
>>> video_by_director

Re: Convert single node C* to cluster (rebalancing problem)

2017-06-12 Thread Akhil Mehra
Great point John.

The OP should also note that data distribution also depends on your schema
and incoming data profile.

If your schema is not modelled correctly you can easily end up unevenly
distributed data.

Cheers,
Akhil

On Tue, Jun 13, 2017 at 3:36 AM, John Hughes  wrote:

> Is the OP expecting a perfect 50%/50% split? That, to my experience, is
> not going to happen, it is almost always shifted from a fraction of a
> percent to a couple percent.
>
> Datacenter: eu-west
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  AddressLoad   Tokens   Owns (effective)  Host ID
> Rack
> UN  XX.XX.XX.XX22.71 GiB  256  47.6%
> 57dafdde-2f62-467c-a8ff-c91e712f89c9  1c
> UN  XX.XX.XX.XX  17.17 GiB  256  51.3%
> d2a65c51-087d-48de-ae1f-a41142eb148d  1b
> UN  XX.XX.XX.XX  26.15 GiB  256  52.4%
> acf5dd34-5b81-4e5b-b7be-85a7fccd8e1c  1c
> UN  XX.XX.XX.XX   16.64 GiB  256  50.2%
> 6c8842dd-a966-467c-a7bc-bd6269ce3e7e  1a
> UN  XX.XX.XX.XX  24.39 GiB  256  49.8%
> fd92525d-edf2-4974-8bc5-a350a8831dfa  1a
> UN  XX.XX.XX.XX   23.8 GiB   256  48.7%
> bdc597c0-718c-4ef6-b3ef-7785110a9923  1b
>
> Though maybe part of what you are experiencing can be cleared up by
> repair/compaction/cleanup. Also, what are your outputs when you call out
> specific keyspaces? Do the numbers get more even?
>
> Cheers,
>
> On Mon, Jun 12, 2017 at 5:22 AM Akhil Mehra  wrote:
>
>> auto_bootstrap is true by default. Ensure its set to true. On startup
>> look at your logs for your auto_bootstrap value.  Look at the node
>> configuration line in your log file.
>>
>> Akhil
>>
>> On Mon, Jun 12, 2017 at 6:18 PM, Junaid Nasir  wrote:
>>
>>> No, I didn't set it (left it at default value)
>>>
>>> On Fri, Jun 9, 2017 at 3:18 AM, ZAIDI, ASAD A  wrote:
>>>
 Did you make sure auto_bootstrap property is indeed set to [true] when
 you added the node?



 *From:* Junaid Nasir [mailto:jna...@an10.io]
 *Sent:* Monday, June 05, 2017 6:29 AM
 *To:* Akhil Mehra 
 *Cc:* Vladimir Yudovin ;
 user@cassandra.apache.org
 *Subject:* Re: Convert single node C* to cluster (rebalancing problem)



 not evenly, i have setup a new cluster with subset of data (around
 5gb). using the configuration above I am getting these results



 Datacenter: datacenter1

 ===

 Status=Up/Down

 |/ State=Normal/Leaving/Joining/Moving

 --  Address  Load   Tokens   Owns (effective)  Host ID Rack

 UN  10.128.2.1   4.86 GiB   256  44.9% 
 e4427611-c247-42ee-9404-371e177f5f17  rack1

 UN  10.128.2.10  725.03 MiB  256 55.1% 
 690d5620-99d3-4ae3-aebe-8f33af54a08b  rack1

 is there anything else I can tweak/check to make the distribution even?



 On Sat, Jun 3, 2017 at 3:30 AM, Akhil Mehra 
 wrote:

 So now the data is evenly balanced in both nodes?



 Refer to the following documentation to get a better understanding of
 the roc_address and the broadcast_rpc_address https://
 www.instaclustr.com/demystifying-cassandras-broadcast_address/
 .
 I am surprised that your node started up with rpc_broadcast_address
 set as this is an unsupported property. I am assuming you are using
 Cassandra version 3.10.





 Regards,

 Akhil



 On 2/06/2017, at 11:06 PM, Junaid Nasir  wrote:



 I am able to get it working. I added a new node with following changes

 #rpc_address:0.0.0.0

 rpc_address: 10.128.1.11

 #rpc_broadcast_address:10.128.1.11

 rpc_address was set to 0.0.0.0, (I ran into a problem previously
 regarding remote connection and made these changes
 https://stackoverflow.com/questions/12236898/apache-
 cassandra-remote-access
 
 )



 should it be happening?



 On Thu, Jun 1, 2017 at 6:31 PM, Vladimir Yudovin 
 wrote:

 Did you run "nodetool cleanup" on first node after second was
 bootstrapped? It should clean rows not 

Re: Reg:- Cassandra Data modelling for Search

2017-06-12 Thread Oskar Kjellin
Agree, I meant as Jonathan said to use C* for primary key and as a primary
storage and ES as an indexed version of what you have in cassandra.

2017-06-12 19:19 GMT+02:00 DuyHai Doan :

> Sorry, I misread some reply I had the impression that people recommend ES
> as primary datastore
>
> On Mon, Jun 12, 2017 at 7:12 PM, Jonathan Haddad 
> wrote:
>
>> Nobody is promoting ES as a primary datastore in this thread.  Every
>> mention of it is to accompany C*.
>>
>>
>>
>> On Mon, Jun 12, 2017 at 10:03 AM DuyHai Doan 
>> wrote:
>>
>>> For all those promoting ES as a PRIMARY datastore, please read this
>>> before:
>>>
>>> https://discuss.elastic.co/t/elasticsearch-as-a-primary-data
>>> base/85733/13
>>>
>>> There are a lot of warning before recommending ES as a datastore.
>>>
>>> The answer from Pilato, ES official evangelist:
>>>
>>>
>>>- You absolutely care about your data and you want to be able to
>>>reindex in all cases. You need for that a datastore. A datastore can be a
>>>filesystem where you store JSON, HDFS, and/or a database you prefer and 
>>> you
>>>are confident with. About how to inject data in it, you may want to read:
>>>http://david.pilato.fr/blog/2015/05/09/advanced-search
>>>-for-your-legacy-application/7
>>>
>>> 
>>>.
>>>
>>>
>>>
>>>
>>> On Mon, Jun 12, 2017 at 5:08 PM, Michael Mior 
>>> wrote:
>>>
 For queries 1-5 this seems like a potentially good use case for
 materialized views. Create one table with the videos stored by ID and the
 materialized views for each of the queries.

 --
 Michael Mior
 mm...@apache.org


 2017-06-11 22:40 GMT-04:00 @Nandan@ :

> Hi,
>
> Currently, I am working on data modeling for Video Company in which we
> have different types of users as well as different user functionality.
> But currently, my concern is about Search video module based on
> different fields.
>
> Query patterns are as below:-
> 1) Select video by actor.
> 2) select video by producer.
> 3) select video by music.
> 4) select video by actor and producer.
> 5) select video by actor and music.
>
> Note: - In short, We want to establish an advanced search module by
> which we can search by anyway and get the desired results.
>
> During a search , we need partial search also such that if any user
> can search "Harry" title, then we are able to give them result as all
> videos whose
>  title contains "Harry" at any location.
>
> As per my ideas, I have to create separate tables such as
> video_by_actor, video_by_producer etc.. and implement solr query on all
> tables. Otherwise,
> is there any others way by which we can implement this search module
> effectively.
>
> Please suggest.
>
> Best regards,
>


>>>
>


Decommissioned nodes show as DOWN in Cassandra version 3.10

2017-06-12 Thread pabbireddy avinash
Hi

In the Cassandra version 3.10, after we decommission a node or datacenter,
we observe the decommissioned nodes marked as DOWN in the cluster when you
do a "nodetool describecluster". The nodes however do not show up in the
"nodetool status" command.
The decommissioned node also does not show up in the "system_peers" table
on the nodes.

The workaround we follow is rolling restart of the cluster, which removes
the decommissioned nodes from the "UNREACHABLE STATE", and shows the actual
state of the cluster. The workaround is tedious for huge clusters.


as anybody in the community observed similar issue?

Below are the observed logs

2017-06-12 18:23:29,209 [RMI TCP Connection(8)-127.0.0.1] INFO
StorageService.java:3938 - Announcing that I have left the ring for 3ms
2017-06-12 18:23:59,210 [RMI TCP Connection(8)-127.0.0.1] INFO
ThriftServer.java:139 - Stop listening to thrift clients
2017-06-12 18:23:59,215 [RMI TCP Connection(8)-127.0.0.1] INFO
Server.java:176 - Stop listening for CQL clients
2017-06-12 18:23:59,216 [RMI TCP Connection(8)-127.0.0.1] WARN
Gossiper.java:1514 - No local state, state is in silent shutdown, or node
hasn't joined, not announcing shutdown
2017-06-12 18:23:59,216 [RMI TCP Connection(8)-127.0.0.1] INFO
MessagingService.java:964 - Waiting for messaging service to quiesce
2017-06-12 18:23:59,217 [ACCEPT-/96.115.209.228] INFO
MessagingService.java:1314 - MessagingService has terminated the accept()
thread
2017-06-12 18:23:59,263 [RMI TCP Connection(8)-127.0.0.1] INFO
StorageService.java:1435 - DECOMMISSIONED



Regards,
Avinash.


Re: Reg:- Cassandra Data modelling for Search

2017-06-12 Thread DuyHai Doan
Sorry, I misread some reply I had the impression that people recommend ES
as primary datastore

On Mon, Jun 12, 2017 at 7:12 PM, Jonathan Haddad  wrote:

> Nobody is promoting ES as a primary datastore in this thread.  Every
> mention of it is to accompany C*.
>
>
>
> On Mon, Jun 12, 2017 at 10:03 AM DuyHai Doan  wrote:
>
>> For all those promoting ES as a PRIMARY datastore, please read this
>> before:
>>
>> https://discuss.elastic.co/t/elasticsearch-as-a-primary-database/85733/13
>>
>> There are a lot of warning before recommending ES as a datastore.
>>
>> The answer from Pilato, ES official evangelist:
>>
>>
>>- You absolutely care about your data and you want to be able to
>>reindex in all cases. You need for that a datastore. A datastore can be a
>>filesystem where you store JSON, HDFS, and/or a database you prefer and 
>> you
>>are confident with. About how to inject data in it, you may want to read:
>>http://david.pilato.fr/blog/2015/05/09/advanced-
>>search-for-your-legacy-application/7
>>
>> 
>>.
>>
>>
>>
>>
>> On Mon, Jun 12, 2017 at 5:08 PM, Michael Mior  wrote:
>>
>>> For queries 1-5 this seems like a potentially good use case for
>>> materialized views. Create one table with the videos stored by ID and the
>>> materialized views for each of the queries.
>>>
>>> --
>>> Michael Mior
>>> mm...@apache.org
>>>
>>>
>>> 2017-06-11 22:40 GMT-04:00 @Nandan@ :
>>>
 Hi,

 Currently, I am working on data modeling for Video Company in which we
 have different types of users as well as different user functionality.
 But currently, my concern is about Search video module based on
 different fields.

 Query patterns are as below:-
 1) Select video by actor.
 2) select video by producer.
 3) select video by music.
 4) select video by actor and producer.
 5) select video by actor and music.

 Note: - In short, We want to establish an advanced search module by
 which we can search by anyway and get the desired results.

 During a search , we need partial search also such that if any user can
 search "Harry" title, then we are able to give them result as all videos
 whose
  title contains "Harry" at any location.

 As per my ideas, I have to create separate tables such as
 video_by_actor, video_by_producer etc.. and implement solr query on all
 tables. Otherwise,
 is there any others way by which we can implement this search module
 effectively.

 Please suggest.

 Best regards,

>>>
>>>
>>


Re: Reg:- Cassandra Data modelling for Search

2017-06-12 Thread Jonathan Haddad
Nobody is promoting ES as a primary datastore in this thread.  Every
mention of it is to accompany C*.



On Mon, Jun 12, 2017 at 10:03 AM DuyHai Doan  wrote:

> For all those promoting ES as a PRIMARY datastore, please read this before:
>
> https://discuss.elastic.co/t/elasticsearch-as-a-primary-database/85733/13
>
> There are a lot of warning before recommending ES as a datastore.
>
> The answer from Pilato, ES official evangelist:
>
>
>- You absolutely care about your data and you want to be able to
>reindex in all cases. You need for that a datastore. A datastore can be a
>filesystem where you store JSON, HDFS, and/or a database you prefer and you
>are confident with. About how to inject data in it, you may want to read:
>
> http://david.pilato.fr/blog/2015/05/09/advanced-search-for-your-legacy-application/
>7
>
> 
>.
>
>
>
>
> On Mon, Jun 12, 2017 at 5:08 PM, Michael Mior  wrote:
>
>> For queries 1-5 this seems like a potentially good use case for
>> materialized views. Create one table with the videos stored by ID and the
>> materialized views for each of the queries.
>>
>> --
>> Michael Mior
>> mm...@apache.org
>>
>>
>> 2017-06-11 22:40 GMT-04:00 @Nandan@ :
>>
>>> Hi,
>>>
>>> Currently, I am working on data modeling for Video Company in which we
>>> have different types of users as well as different user functionality.
>>> But currently, my concern is about Search video module based on
>>> different fields.
>>>
>>> Query patterns are as below:-
>>> 1) Select video by actor.
>>> 2) select video by producer.
>>> 3) select video by music.
>>> 4) select video by actor and producer.
>>> 5) select video by actor and music.
>>>
>>> Note: - In short, We want to establish an advanced search module by
>>> which we can search by anyway and get the desired results.
>>>
>>> During a search , we need partial search also such that if any user can
>>> search "Harry" title, then we are able to give them result as all videos
>>> whose
>>>  title contains "Harry" at any location.
>>>
>>> As per my ideas, I have to create separate tables such as
>>> video_by_actor, video_by_producer etc.. and implement solr query on all
>>> tables. Otherwise,
>>> is there any others way by which we can implement this search module
>>> effectively.
>>>
>>> Please suggest.
>>>
>>> Best regards,
>>>
>>
>>
>


Re: Reg:- Cassandra Data modelling for Search

2017-06-12 Thread DuyHai Doan
For all those promoting ES as a PRIMARY datastore, please read this before:

https://discuss.elastic.co/t/elasticsearch-as-a-primary-database/85733/13

There are a lot of warning before recommending ES as a datastore.

The answer from Pilato, ES official evangelist:


   - You absolutely care about your data and you want to be able to reindex
   in all cases. You need for that a datastore. A datastore can be a
   filesystem where you store JSON, HDFS, and/or a database you prefer and you
   are confident with. About how to inject data in it, you may want to read:
   http://david.pilato.fr/blog/2015/05/09/advanced-search-for-your-legacy-
   application/7
   

   .




On Mon, Jun 12, 2017 at 5:08 PM, Michael Mior  wrote:

> For queries 1-5 this seems like a potentially good use case for
> materialized views. Create one table with the videos stored by ID and the
> materialized views for each of the queries.
>
> --
> Michael Mior
> mm...@apache.org
>
>
> 2017-06-11 22:40 GMT-04:00 @Nandan@ :
>
>> Hi,
>>
>> Currently, I am working on data modeling for Video Company in which we
>> have different types of users as well as different user functionality.
>> But currently, my concern is about Search video module based on different
>> fields.
>>
>> Query patterns are as below:-
>> 1) Select video by actor.
>> 2) select video by producer.
>> 3) select video by music.
>> 4) select video by actor and producer.
>> 5) select video by actor and music.
>>
>> Note: - In short, We want to establish an advanced search module by which
>> we can search by anyway and get the desired results.
>>
>> During a search , we need partial search also such that if any user can
>> search "Harry" title, then we are able to give them result as all videos
>> whose
>>  title contains "Harry" at any location.
>>
>> As per my ideas, I have to create separate tables such as video_by_actor,
>> video_by_producer etc.. and implement solr query on all tables. Otherwise,
>> is there any others way by which we can implement this search module
>> effectively.
>>
>> Please suggest.
>>
>> Best regards,
>>
>
>


Re: Convert single node C* to cluster (rebalancing problem)

2017-06-12 Thread John Hughes
Is the OP expecting a perfect 50%/50% split? That, to my experience, is not
going to happen, it is almost always shifted from a fraction of a percent
to a couple percent.

Datacenter: eu-west
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens   Owns (effective)  Host ID
Rack
UN  XX.XX.XX.XX22.71 GiB  256  47.6%
57dafdde-2f62-467c-a8ff-c91e712f89c9  1c
UN  XX.XX.XX.XX  17.17 GiB  256  51.3%
d2a65c51-087d-48de-ae1f-a41142eb148d  1b
UN  XX.XX.XX.XX  26.15 GiB  256  52.4%
acf5dd34-5b81-4e5b-b7be-85a7fccd8e1c  1c
UN  XX.XX.XX.XX   16.64 GiB  256  50.2%
6c8842dd-a966-467c-a7bc-bd6269ce3e7e  1a
UN  XX.XX.XX.XX  24.39 GiB  256  49.8%
fd92525d-edf2-4974-8bc5-a350a8831dfa  1a
UN  XX.XX.XX.XX   23.8 GiB   256  48.7%
bdc597c0-718c-4ef6-b3ef-7785110a9923  1b

Though maybe part of what you are experiencing can be cleared up by
repair/compaction/cleanup. Also, what are your outputs when you call out
specific keyspaces? Do the numbers get more even?

Cheers,

On Mon, Jun 12, 2017 at 5:22 AM Akhil Mehra  wrote:

> auto_bootstrap is true by default. Ensure its set to true. On startup look
> at your logs for your auto_bootstrap value.  Look at the node configuration
> line in your log file.
>
> Akhil
>
> On Mon, Jun 12, 2017 at 6:18 PM, Junaid Nasir  wrote:
>
>> No, I didn't set it (left it at default value)
>>
>> On Fri, Jun 9, 2017 at 3:18 AM, ZAIDI, ASAD A  wrote:
>>
>>> Did you make sure auto_bootstrap property is indeed set to [true] when
>>> you added the node?
>>>
>>>
>>>
>>> *From:* Junaid Nasir [mailto:jna...@an10.io]
>>> *Sent:* Monday, June 05, 2017 6:29 AM
>>> *To:* Akhil Mehra 
>>> *Cc:* Vladimir Yudovin ; user@cassandra.apache.org
>>> *Subject:* Re: Convert single node C* to cluster (rebalancing problem)
>>>
>>>
>>>
>>> not evenly, i have setup a new cluster with subset of data (around 5gb).
>>> using the configuration above I am getting these results
>>>
>>>
>>>
>>> Datacenter: datacenter1
>>>
>>> ===
>>>
>>> Status=Up/Down
>>>
>>> |/ State=Normal/Leaving/Joining/Moving
>>>
>>> --  Address  Load   Tokens   Owns (effective)  Host ID Rack
>>>
>>> UN  10.128.2.1   4.86 GiB   256  44.9% 
>>> e4427611-c247-42ee-9404-371e177f5f17  rack1
>>>
>>> UN  10.128.2.10  725.03 MiB  256 55.1% 
>>> 690d5620-99d3-4ae3-aebe-8f33af54a08b  rack1
>>>
>>> is there anything else I can tweak/check to make the distribution even?
>>>
>>>
>>>
>>> On Sat, Jun 3, 2017 at 3:30 AM, Akhil Mehra 
>>> wrote:
>>>
>>> So now the data is evenly balanced in both nodes?
>>>
>>>
>>>
>>> Refer to the following documentation to get a better understanding of
>>> the roc_address and the broadcast_rpc_address
>>> https://www.instaclustr.com/demystifying-cassandras-broadcast_address/
>>> .
>>> I am surprised that your node started up with rpc_broadcast_address set as
>>> this is an unsupported property. I am assuming you are using Cassandra
>>> version 3.10.
>>>
>>>
>>>
>>>
>>>
>>> Regards,
>>>
>>> Akhil
>>>
>>>
>>>
>>> On 2/06/2017, at 11:06 PM, Junaid Nasir  wrote:
>>>
>>>
>>>
>>> I am able to get it working. I added a new node with following changes
>>>
>>> #rpc_address:0.0.0.0
>>>
>>> rpc_address: 10.128.1.11
>>>
>>> #rpc_broadcast_address:10.128.1.11
>>>
>>> rpc_address was set to 0.0.0.0, (I ran into a problem previously
>>> regarding remote connection and made these changes
>>> https://stackoverflow.com/questions/12236898/apache-cassandra-remote-access
>>> 
>>> )
>>>
>>>
>>>
>>> should it be happening?
>>>
>>>
>>>
>>> On Thu, Jun 1, 2017 at 6:31 PM, Vladimir Yudovin 
>>> wrote:
>>>
>>> Did you run "nodetool cleanup" on first node after second was
>>> bootstrapped? It should clean rows not belonging to node after tokens
>>> changed.
>>>
>>>
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone
>>> 
>>> - Cloud Cassandra Hosting*
>>>
>>>
>>>
>>>
>>>
>>>  On Wed, 31 May 2017 03:55:54 -0400 *Junaid Nasir >> 

Re: Reg:- Cassandra Data modelling for Search

2017-06-12 Thread Michael Mior
For queries 1-5 this seems like a potentially good use case for
materialized views. Create one table with the videos stored by ID and the
materialized views for each of the queries.

--
Michael Mior
mm...@apache.org


2017-06-11 22:40 GMT-04:00 @Nandan@ :

> Hi,
>
> Currently, I am working on data modeling for Video Company in which we
> have different types of users as well as different user functionality.
> But currently, my concern is about Search video module based on different
> fields.
>
> Query patterns are as below:-
> 1) Select video by actor.
> 2) select video by producer.
> 3) select video by music.
> 4) select video by actor and producer.
> 5) select video by actor and music.
>
> Note: - In short, We want to establish an advanced search module by which
> we can search by anyway and get the desired results.
>
> During a search , we need partial search also such that if any user can
> search "Harry" title, then we are able to give them result as all videos
> whose
>  title contains "Harry" at any location.
>
> As per my ideas, I have to create separate tables such as video_by_actor,
> video_by_producer etc.. and implement solr query on all tables. Otherwise,
> is there any others way by which we can implement this search module
> effectively.
>
> Please suggest.
>
> Best regards,
>


Re: Using Cassandra for my usecase

2017-06-12 Thread Eric Evans
On Sat, Jun 10, 2017 at 11:07 AM, Govindarajan Srinivasaraghavan
 wrote:
> Hi All,
>
> Just to give a background I'm working on a project where I need to store
> fast incoming time series data and have rest api's to query and serve the
> data to users when needed. The data as such is a single JSON which is 1kb in
> size and the data has to be purged after a specific time period (say few
> weeks or months). The incoming rate would be approximately 100k messages per
> second and the biggest challenge is the data should be query-able by
> multiple dimensions with sorting, paging and data dump options.
>
> I started looking into database options and felt like cassandra might be a
> good choice for my use case since the requirement needs faster writes. In
> order to query by multiple dimensions I had to insert the same record into
> multiple denormalized tables (around 8 tables). Now I need to implement
> multitenancy and having an extra column in the partition key to query by
> tenant will not work since there will be some tenants with huge amounts of
> data compared to the rest. My other option is to have the tenant identifier
> appended to the table names so that I can perform per teannt queries easily.
>
> Here are my questions for which I need some help.
> - Given my use case is cassandra the best suited one or is there any other
> database which suits my requirement better?
> - What would be best way to implement multi-tenancy?
> - Given that I need to query by multiple dimensions would denormalized
> tables work better or should I be using materialized views?
> - Anything else that I need to consider based on your experiences with
> cassandra?

You might have a look at newts (http://newts.io,
https://github.com/opennms/newts), I think it ticks all of the above
boxes.


-- 
Eric Evans
john.eric.ev...@gmail.com

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Reg:- Cassandra Data modelling for Search

2017-06-12 Thread Jason Brown
removing dev@ from this conversation, as the thread is more appropriately
for user@

On Mon, Jun 12, 2017 at 4:51 AM, Eduardo Alonso 
wrote:

> -Virtual tokens are not recommended when using SOLR or
> cassandra-lucene-index.
>
> If you use your table schema you will not have any problem with partition
> size because your table is *not* a WIDE row table (it does not have
> clustering keys)
> The limit for 1 record with those 15 or 20 columns must not be larger that
> 100MB. You will have enough.
>
> Eduardo Alonso
> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd
> *
>
> 2017-06-12 12:36 GMT+02:00 @Nandan@ :
>
> > And due to single table videos, maybe it will go with around 15,20
> > columns, then we need to also think very carefully about partition sizes
> > also.
> >
> > On Mon, Jun 12, 2017 at 6:33 PM, @Nandan@  com>
> > wrote:
> >
> >> Yes this is only Option I am also thinking like this as my second
> >> options. Before this I was thinking to do denormalize table based on
> search
> >> columns, but due to partial search this will be not that effective.
> >>
> >> Now suppose , if we are going with this single table as videos. and
> >> implemented with Solr/Lucene, then need to also care about num_tokens ?
> >>
> >>
> >> On Mon, Jun 12, 2017 at 6:27 PM, Eduardo Alonso <
> >> eduardoalo...@stratio.com> wrote:
> >>
> >>> Using cassandra collections
> >>>
> >>> CREATE TABLE videos (
> >>> videoid uuid primary key,
> >>> title text,
> >>> actor list,
> >>> producer list,
> >>> release_date timestamp,
> >>> description text,
> >>> music text,
> >>> etc...
> >>> );
> >>>
> >>> When using collection you need to take care of its length. Collections
> >>> are designed to store
> >>>  collections_c.html>only
> >>> a small amount of data
> >>>  collections_c.html>
> >>> .
> >>> 5/10 actors per movie is ok.
> >>>
> >>>
> >>> Eduardo Alonso
> >>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
> >>> 28224 Pozuelo de Alarcón, Madrid
> >>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com //
> *@stratiobd
> >>> *
> >>>
> >>> 2017-06-12 11:54 GMT+02:00 @Nandan@ :
> >>>
>  So In short we have to go with one single table as videos and put
>  primary key as videoid uuid.
>  But then how can we able to handle multiple actor name and producer
>  name. ?
> 
>  On Mon, Jun 12, 2017 at 5:51 PM, Eduardo Alonso <
>  eduardoalo...@stratio.com> wrote:
> 
> > Yes, you are right.
> >
> > Table denormalization is useful just when you have unique primary
> > keys, not your case.
> > Denormalized tables are only different in its primary key, every
> > denormalized table contains all the data (it just change how it is
> > structured). So, if you need to index it, do it with just one table
> (the
> > one you showed us with videoid as the primary key is ok).
> >
> > Solr, Elastic and cassandra-lucene-index are both based on Lucene and
> > all of them fulfill all your needs.
> >
> > Solr (in DSE) and cassandra-lucene-index
> >  are very well
> > integrated with cassandra using its secondary index interface. If you
> > choose elastic search you will need to code the integration (write
> mutex,
> > both cluster synchronization (imagine something written in cassandra
> but
> > failed to write in elastic))
> >
> > I know i am not the most suitable to recommend you to use our product
> > cassandra-lucene-index
> >  but it is open
> > source, just take a look.
> >
> > Eduardo Alonso
> > Vía de las dos Castillas, 33, Ática 4, 3ª Planta
> > 28224 Pozuelo de Alarcón, Madrid
> > Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com
> // *@stratiobd
> > *
> >
> > 2017-06-12 11:18 GMT+02:00 @Nandan@  >:
> >
> >> Hi Eduardo,
> >>
> >> And As we are trying to build an advanced search functionality in
> >> which we can able to do partial search based on actor, producer,
> director,
> >> etc. columns.
> >> So if we do denormalization of tables then we have to create tables
> >> such as below :-
> >> video_by_actor
> >> video_by_producer
> >> video_by_director
> >> video_by_date
> >> etc..
> >> By using denormalized, Cassandra only allows us to do equality
> >> search, but for implementing Partial search we need to implement
> solr on
> >> all above tables.
> 

Re: Reg:- Cassandra Data modelling for Search

2017-06-12 Thread Eduardo Alonso
-Virtual tokens are not recommended when using SOLR or
cassandra-lucene-index.

If you use your table schema you will not have any problem with partition
size because your table is *not* a WIDE row table (it does not have
clustering keys)
The limit for 1 record with those 15 or 20 columns must not be larger that
100MB. You will have enough.

Eduardo Alonso
Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd
*

2017-06-12 12:36 GMT+02:00 @Nandan@ :

> And due to single table videos, maybe it will go with around 15,20
> columns, then we need to also think very carefully about partition sizes
> also.
>
> On Mon, Jun 12, 2017 at 6:33 PM, @Nandan@ 
> wrote:
>
>> Yes this is only Option I am also thinking like this as my second
>> options. Before this I was thinking to do denormalize table based on search
>> columns, but due to partial search this will be not that effective.
>>
>> Now suppose , if we are going with this single table as videos. and
>> implemented with Solr/Lucene, then need to also care about num_tokens ?
>>
>>
>> On Mon, Jun 12, 2017 at 6:27 PM, Eduardo Alonso <
>> eduardoalo...@stratio.com> wrote:
>>
>>> Using cassandra collections
>>>
>>> CREATE TABLE videos (
>>> videoid uuid primary key,
>>> title text,
>>> actor list,
>>> producer list,
>>> release_date timestamp,
>>> description text,
>>> music text,
>>> etc...
>>> );
>>>
>>> When using collection you need to take care of its length. Collections
>>> are designed to store
>>> only
>>> a small amount of data
>>> 
>>> .
>>> 5/10 actors per movie is ok.
>>>
>>>
>>> Eduardo Alonso
>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>>> 28224 Pozuelo de Alarcón, Madrid
>>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com // 
>>> *@stratiobd
>>> *
>>>
>>> 2017-06-12 11:54 GMT+02:00 @Nandan@ :
>>>
 So In short we have to go with one single table as videos and put
 primary key as videoid uuid.
 But then how can we able to handle multiple actor name and producer
 name. ?

 On Mon, Jun 12, 2017 at 5:51 PM, Eduardo Alonso <
 eduardoalo...@stratio.com> wrote:

> Yes, you are right.
>
> Table denormalization is useful just when you have unique primary
> keys, not your case.
> Denormalized tables are only different in its primary key, every
> denormalized table contains all the data (it just change how it is
> structured). So, if you need to index it, do it with just one table (the
> one you showed us with videoid as the primary key is ok).
>
> Solr, Elastic and cassandra-lucene-index are both based on Lucene and
> all of them fulfill all your needs.
>
> Solr (in DSE) and cassandra-lucene-index
>  are very well
> integrated with cassandra using its secondary index interface. If you
> choose elastic search you will need to code the integration (write mutex,
> both cluster synchronization (imagine something written in cassandra but
> failed to write in elastic))
>
> I know i am not the most suitable to recommend you to use our product
> cassandra-lucene-index
>  but it is open
> source, just take a look.
>
> Eduardo Alonso
> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com // 
> *@stratiobd
> *
>
> 2017-06-12 11:18 GMT+02:00 @Nandan@ :
>
>> Hi Eduardo,
>>
>> And As we are trying to build an advanced search functionality in
>> which we can able to do partial search based on actor, producer, 
>> director,
>> etc. columns.
>> So if we do denormalization of tables then we have to create tables
>> such as below :-
>> video_by_actor
>> video_by_producer
>> video_by_director
>> video_by_date
>> etc..
>> By using denormalized, Cassandra only allows us to do equality
>> search, but for implementing Partial search we need to implement solr on
>> all above tables.
>>
>> This is my thinking, but I think this will be not correct way to
>> implement Apache Solr on all tables.
>>
>> On Mon, Jun 12, 2017 at 5:11 PM, @Nandan@ <
>> nandanpriyadarshi...@gmail.com> wrote:
>>
>>> Hi Edurado,
>>>
>>> As you mentioned queries 1-6 ,
>>> In this condition, we have to proceed with a table like as below :-
>>> create table videos (
>>> 

Re: Reg:- Cassandra Data modelling for Search

2017-06-12 Thread @Nandan@
And due to single table videos, maybe it will go with around 15,20 columns,
then we need to also think very carefully about partition sizes also.

On Mon, Jun 12, 2017 at 6:33 PM, @Nandan@ 
wrote:

> Yes this is only Option I am also thinking like this as my second options.
> Before this I was thinking to do denormalize table based on search columns,
> but due to partial search this will be not that effective.
>
> Now suppose , if we are going with this single table as videos. and
> implemented with Solr/Lucene, then need to also care about num_tokens ?
>
>
> On Mon, Jun 12, 2017 at 6:27 PM, Eduardo Alonso  > wrote:
>
>> Using cassandra collections
>>
>> CREATE TABLE videos (
>> videoid uuid primary key,
>> title text,
>> actor list,
>> producer list,
>> release_date timestamp,
>> description text,
>> music text,
>> etc...
>> );
>>
>> When using collection you need to take care of its length. Collections
>> are designed to store
>> only
>> a small amount of data
>> 
>> .
>> 5/10 actors per movie is ok.
>>
>>
>> Eduardo Alonso
>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>> 28224 Pozuelo de Alarcón, Madrid
>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com // 
>> *@stratiobd
>> *
>>
>> 2017-06-12 11:54 GMT+02:00 @Nandan@ :
>>
>>> So In short we have to go with one single table as videos and put
>>> primary key as videoid uuid.
>>> But then how can we able to handle multiple actor name and producer
>>> name. ?
>>>
>>> On Mon, Jun 12, 2017 at 5:51 PM, Eduardo Alonso <
>>> eduardoalo...@stratio.com> wrote:
>>>
 Yes, you are right.

 Table denormalization is useful just when you have unique primary keys,
 not your case.
 Denormalized tables are only different in its primary key, every
 denormalized table contains all the data (it just change how it is
 structured). So, if you need to index it, do it with just one table (the
 one you showed us with videoid as the primary key is ok).

 Solr, Elastic and cassandra-lucene-index are both based on Lucene and
 all of them fulfill all your needs.

 Solr (in DSE) and cassandra-lucene-index
  are very well
 integrated with cassandra using its secondary index interface. If you
 choose elastic search you will need to code the integration (write mutex,
 both cluster synchronization (imagine something written in cassandra but
 failed to write in elastic))

 I know i am not the most suitable to recommend you to use our product
 cassandra-lucene-index
  but it is open
 source, just take a look.

 Eduardo Alonso
 Vía de las dos Castillas, 33, Ática 4, 3ª Planta
 28224 Pozuelo de Alarcón, Madrid
 Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com // 
 *@stratiobd
 *

 2017-06-12 11:18 GMT+02:00 @Nandan@ :

> Hi Eduardo,
>
> And As we are trying to build an advanced search functionality in
> which we can able to do partial search based on actor, producer, director,
> etc. columns.
> So if we do denormalization of tables then we have to create tables
> such as below :-
> video_by_actor
> video_by_producer
> video_by_director
> video_by_date
> etc..
> By using denormalized, Cassandra only allows us to do equality search,
> but for implementing Partial search we need to implement solr on all above
> tables.
>
> This is my thinking, but I think this will be not correct way to
> implement Apache Solr on all tables.
>
> On Mon, Jun 12, 2017 at 5:11 PM, @Nandan@ <
> nandanpriyadarshi...@gmail.com> wrote:
>
>> Hi Edurado,
>>
>> As you mentioned queries 1-6 ,
>> In this condition, we have to proceed with a table like as below :-
>> create table videos (
>> videoid uuid primary key,
>> title text,
>> actor text,
>> producer text,
>> release_date timestamp,
>> description text,
>> music text,
>> etc...
>> );
>> This table will help to store video datas based on PK videoid and
>> will give uniqeness due to uuid.
>> But as we know , in one movie there are multiple actor, multiple
>> producer, multiple music worked, So how can we store all these.. Only one
>> option will left as to use collection type columns.
>>
>>
>> On Mon, Jun 12, 2017 at 4:59 PM, Eduardo Alonso <
>> eduardoalo...@stratio.com> wrote:
>>
>>> TLDR shouldBe *PD
>>>
>>> Eduardo Alonso
>>> Vía de las dos Castillas, 33, Ática 4, 3ª 

Re: Reg:- Cassandra Data modelling for Search

2017-06-12 Thread @Nandan@
Yes this is only Option I am also thinking like this as my second options.
Before this I was thinking to do denormalize table based on search columns,
but due to partial search this will be not that effective.

Now suppose , if we are going with this single table as videos. and
implemented with Solr/Lucene, then need to also care about num_tokens ?


On Mon, Jun 12, 2017 at 6:27 PM, Eduardo Alonso 
wrote:

> Using cassandra collections
>
> CREATE TABLE videos (
> videoid uuid primary key,
> title text,
> actor list,
> producer list,
> release_date timestamp,
> description text,
> music text,
> etc...
> );
>
> When using collection you need to take care of its length. Collections
> are designed to store
> only
> a small amount of data
> 
> .
> 5/10 actors per movie is ok.
>
>
> Eduardo Alonso
> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com // 
> *@stratiobd
> *
>
> 2017-06-12 11:54 GMT+02:00 @Nandan@ :
>
>> So In short we have to go with one single table as videos and put primary
>> key as videoid uuid.
>> But then how can we able to handle multiple actor name and producer name.
>> ?
>>
>> On Mon, Jun 12, 2017 at 5:51 PM, Eduardo Alonso <
>> eduardoalo...@stratio.com> wrote:
>>
>>> Yes, you are right.
>>>
>>> Table denormalization is useful just when you have unique primary keys,
>>> not your case.
>>> Denormalized tables are only different in its primary key, every
>>> denormalized table contains all the data (it just change how it is
>>> structured). So, if you need to index it, do it with just one table (the
>>> one you showed us with videoid as the primary key is ok).
>>>
>>> Solr, Elastic and cassandra-lucene-index are both based on Lucene and
>>> all of them fulfill all your needs.
>>>
>>> Solr (in DSE) and cassandra-lucene-index
>>>  are very well
>>> integrated with cassandra using its secondary index interface. If you
>>> choose elastic search you will need to code the integration (write mutex,
>>> both cluster synchronization (imagine something written in cassandra but
>>> failed to write in elastic))
>>>
>>> I know i am not the most suitable to recommend you to use our product
>>> cassandra-lucene-index
>>>  but it is open
>>> source, just take a look.
>>>
>>> Eduardo Alonso
>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>>> 28224 Pozuelo de Alarcón, Madrid
>>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com // 
>>> *@stratiobd
>>> *
>>>
>>> 2017-06-12 11:18 GMT+02:00 @Nandan@ :
>>>
 Hi Eduardo,

 And As we are trying to build an advanced search functionality in which
 we can able to do partial search based on actor, producer, director, etc.
 columns.
 So if we do denormalization of tables then we have to create tables
 such as below :-
 video_by_actor
 video_by_producer
 video_by_director
 video_by_date
 etc..
 By using denormalized, Cassandra only allows us to do equality search,
 but for implementing Partial search we need to implement solr on all above
 tables.

 This is my thinking, but I think this will be not correct way to
 implement Apache Solr on all tables.

 On Mon, Jun 12, 2017 at 5:11 PM, @Nandan@ <
 nandanpriyadarshi...@gmail.com> wrote:

> Hi Edurado,
>
> As you mentioned queries 1-6 ,
> In this condition, we have to proceed with a table like as below :-
> create table videos (
> videoid uuid primary key,
> title text,
> actor text,
> producer text,
> release_date timestamp,
> description text,
> music text,
> etc...
> );
> This table will help to store video datas based on PK videoid and will
> give uniqeness due to uuid.
> But as we know , in one movie there are multiple actor, multiple
> producer, multiple music worked, So how can we store all these.. Only one
> option will left as to use collection type columns.
>
>
> On Mon, Jun 12, 2017 at 4:59 PM, Eduardo Alonso <
> eduardoalo...@stratio.com> wrote:
>
>> TLDR shouldBe *PD
>>
>> Eduardo Alonso
>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>> 28224 Pozuelo de Alarcón, Madrid
>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com
>>  // *@stratiobd *
>>
>> 2017-06-12 10:58 GMT+02:00 Eduardo Alonso 
>> :
>>
>>> Hi Nandan:
>>>
>>> So, your system must provide these queries:
>>>
>>> 

Re: Reg:- Cassandra Data modelling for Search

2017-06-12 Thread Eduardo Alonso
Using cassandra collections

CREATE TABLE videos (
videoid uuid primary key,
title text,
actor list,
producer list,
release_date timestamp,
description text,
music text,
etc...
);

When using collection you need to take care of its length. Collections are
designed to store
only
a small amount of data
.
5/10 actors per movie is ok.


Eduardo Alonso
Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd
*

2017-06-12 11:54 GMT+02:00 @Nandan@ :

> So In short we have to go with one single table as videos and put primary
> key as videoid uuid.
> But then how can we able to handle multiple actor name and producer name.
> ?
>
> On Mon, Jun 12, 2017 at 5:51 PM, Eduardo Alonso  > wrote:
>
>> Yes, you are right.
>>
>> Table denormalization is useful just when you have unique primary keys,
>> not your case.
>> Denormalized tables are only different in its primary key, every
>> denormalized table contains all the data (it just change how it is
>> structured). So, if you need to index it, do it with just one table (the
>> one you showed us with videoid as the primary key is ok).
>>
>> Solr, Elastic and cassandra-lucene-index are both based on Lucene and all
>> of them fulfill all your needs.
>>
>> Solr (in DSE) and cassandra-lucene-index
>>  are very well
>> integrated with cassandra using its secondary index interface. If you
>> choose elastic search you will need to code the integration (write mutex,
>> both cluster synchronization (imagine something written in cassandra but
>> failed to write in elastic))
>>
>> I know i am not the most suitable to recommend you to use our product
>> cassandra-lucene-index
>>  but it is open
>> source, just take a look.
>>
>> Eduardo Alonso
>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>> 28224 Pozuelo de Alarcón, Madrid
>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com // 
>> *@stratiobd
>> *
>>
>> 2017-06-12 11:18 GMT+02:00 @Nandan@ :
>>
>>> Hi Eduardo,
>>>
>>> And As we are trying to build an advanced search functionality in which
>>> we can able to do partial search based on actor, producer, director, etc.
>>> columns.
>>> So if we do denormalization of tables then we have to create tables such
>>> as below :-
>>> video_by_actor
>>> video_by_producer
>>> video_by_director
>>> video_by_date
>>> etc..
>>> By using denormalized, Cassandra only allows us to do equality search,
>>> but for implementing Partial search we need to implement solr on all above
>>> tables.
>>>
>>> This is my thinking, but I think this will be not correct way to
>>> implement Apache Solr on all tables.
>>>
>>> On Mon, Jun 12, 2017 at 5:11 PM, @Nandan@ >> m> wrote:
>>>
 Hi Edurado,

 As you mentioned queries 1-6 ,
 In this condition, we have to proceed with a table like as below :-
 create table videos (
 videoid uuid primary key,
 title text,
 actor text,
 producer text,
 release_date timestamp,
 description text,
 music text,
 etc...
 );
 This table will help to store video datas based on PK videoid and will
 give uniqeness due to uuid.
 But as we know , in one movie there are multiple actor, multiple
 producer, multiple music worked, So how can we store all these.. Only one
 option will left as to use collection type columns.


 On Mon, Jun 12, 2017 at 4:59 PM, Eduardo Alonso <
 eduardoalo...@stratio.com> wrote:

> TLDR shouldBe *PD
>
> Eduardo Alonso
> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com // 
> *@stratiobd
> *
>
> 2017-06-12 10:58 GMT+02:00 Eduardo Alonso :
>
>> Hi Nandan:
>>
>> So, your system must provide these queries:
>>
>> 1 - SELECT video FROM ... WHERE actor = '...';
>> 2 - SELECT video FROM ... WHERE producer = '...';
>> 3 - SELECT video FROM ... WHERE music = '...';
>> 4 - SELECT video FROM ... WHERE actor = '...' AND producer ='...';
>> 5 - SELECT video FROM ... WHERE actor = '...' AND music = '...';
>> 6 - SELECT video WHERE title CONTAINS 'Harry';
>>
>>
>> For queries 1-5 you can get them with just cassandra, denormalizing
>> tables just the way your mentioned but without solr, just cassandra
>> (Indeed, just for equality clauses)
>>
>> video_by_actor;
>> video_by_producer;

Re: Reg:- Cassandra Data modelling for Search

2017-06-12 Thread @Nandan@
So In short we have to go with one single table as videos and put primary
key as videoid uuid.
But then how can we able to handle multiple actor name and producer name. ?

On Mon, Jun 12, 2017 at 5:51 PM, Eduardo Alonso 
wrote:

> Yes, you are right.
>
> Table denormalization is useful just when you have unique primary keys,
> not your case.
> Denormalized tables are only different in its primary key, every
> denormalized table contains all the data (it just change how it is
> structured). So, if you need to index it, do it with just one table (the
> one you showed us with videoid as the primary key is ok).
>
> Solr, Elastic and cassandra-lucene-index are both based on Lucene and all
> of them fulfill all your needs.
>
> Solr (in DSE) and cassandra-lucene-index
>  are very well
> integrated with cassandra using its secondary index interface. If you
> choose elastic search you will need to code the integration (write mutex,
> both cluster synchronization (imagine something written in cassandra but
> failed to write in elastic))
>
> I know i am not the most suitable to recommend you to use our product
> cassandra-lucene-index 
> but it is open source, just take a look.
>
> Eduardo Alonso
> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com // 
> *@stratiobd
> *
>
> 2017-06-12 11:18 GMT+02:00 @Nandan@ :
>
>> Hi Eduardo,
>>
>> And As we are trying to build an advanced search functionality in which
>> we can able to do partial search based on actor, producer, director, etc.
>> columns.
>> So if we do denormalization of tables then we have to create tables such
>> as below :-
>> video_by_actor
>> video_by_producer
>> video_by_director
>> video_by_date
>> etc..
>> By using denormalized, Cassandra only allows us to do equality search,
>> but for implementing Partial search we need to implement solr on all above
>> tables.
>>
>> This is my thinking, but I think this will be not correct way to
>> implement Apache Solr on all tables.
>>
>> On Mon, Jun 12, 2017 at 5:11 PM, @Nandan@ > > wrote:
>>
>>> Hi Edurado,
>>>
>>> As you mentioned queries 1-6 ,
>>> In this condition, we have to proceed with a table like as below :-
>>> create table videos (
>>> videoid uuid primary key,
>>> title text,
>>> actor text,
>>> producer text,
>>> release_date timestamp,
>>> description text,
>>> music text,
>>> etc...
>>> );
>>> This table will help to store video datas based on PK videoid and will
>>> give uniqeness due to uuid.
>>> But as we know , in one movie there are multiple actor, multiple
>>> producer, multiple music worked, So how can we store all these.. Only one
>>> option will left as to use collection type columns.
>>>
>>>
>>> On Mon, Jun 12, 2017 at 4:59 PM, Eduardo Alonso <
>>> eduardoalo...@stratio.com> wrote:
>>>
 TLDR shouldBe *PD

 Eduardo Alonso
 Vía de las dos Castillas, 33, Ática 4, 3ª Planta
 28224 Pozuelo de Alarcón, Madrid
 Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com // 
 *@stratiobd
 *

 2017-06-12 10:58 GMT+02:00 Eduardo Alonso :

> Hi Nandan:
>
> So, your system must provide these queries:
>
> 1 - SELECT video FROM ... WHERE actor = '...';
> 2 - SELECT video FROM ... WHERE producer = '...';
> 3 - SELECT video FROM ... WHERE music = '...';
> 4 - SELECT video FROM ... WHERE actor = '...' AND producer ='...';
> 5 - SELECT video FROM ... WHERE actor = '...' AND music = '...';
> 6 - SELECT video WHERE title CONTAINS 'Harry';
>
>
> For queries 1-5 you can get them with just cassandra, denormalizing
> tables just the way your mentioned but without solr, just cassandra
> (Indeed, just for equality clauses)
>
> video_by_actor;
> video_by_producer;
> video_by_music;
> video_by_actor_and_producer;
> video_by_actor_and_music;
>
> For queries number 6 you need a search engine.
>
> SOL
> ElasticSearch
> cassandra-lucene-index
> 
> SASI
> 
>
> I think, just for your query,  the easiest way to get it is to build a
> SASI index.
> TLDR: I work for stratio in cassandra-lucene-index but for your basic
> query (only one dimension), SASI indexes will work for you.
>
>
>
>
> Eduardo Alonso
> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com // 
> *@stratiobd
> 

Re: Reg:- Cassandra Data modelling for Search

2017-06-12 Thread Eduardo Alonso
Yes, you are right.

Table denormalization is useful just when you have unique primary keys, not
your case.
Denormalized tables are only different in its primary key, every
denormalized table contains all the data (it just change how it is
structured). So, if you need to index it, do it with just one table (the
one you showed us with videoid as the primary key is ok).

Solr, Elastic and cassandra-lucene-index are both based on Lucene and all
of them fulfill all your needs.

Solr (in DSE) and cassandra-lucene-index
 are very well
integrated with cassandra using its secondary index interface. If you
choose elastic search you will need to code the integration (write mutex,
both cluster synchronization (imagine something written in cassandra but
failed to write in elastic))

I know i am not the most suitable to recommend you to use our product
cassandra-lucene-index 
but it is open source, just take a look.

Eduardo Alonso
Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd
*

2017-06-12 11:18 GMT+02:00 @Nandan@ :

> Hi Eduardo,
>
> And As we are trying to build an advanced search functionality in which we
> can able to do partial search based on actor, producer, director, etc.
> columns.
> So if we do denormalization of tables then we have to create tables such
> as below :-
> video_by_actor
> video_by_producer
> video_by_director
> video_by_date
> etc..
> By using denormalized, Cassandra only allows us to do equality search, but
> for implementing Partial search we need to implement solr on all above
> tables.
>
> This is my thinking, but I think this will be not correct way to implement
> Apache Solr on all tables.
>
> On Mon, Jun 12, 2017 at 5:11 PM, @Nandan@ 
> wrote:
>
>> Hi Edurado,
>>
>> As you mentioned queries 1-6 ,
>> In this condition, we have to proceed with a table like as below :-
>> create table videos (
>> videoid uuid primary key,
>> title text,
>> actor text,
>> producer text,
>> release_date timestamp,
>> description text,
>> music text,
>> etc...
>> );
>> This table will help to store video datas based on PK videoid and will
>> give uniqeness due to uuid.
>> But as we know , in one movie there are multiple actor, multiple
>> producer, multiple music worked, So how can we store all these.. Only one
>> option will left as to use collection type columns.
>>
>>
>> On Mon, Jun 12, 2017 at 4:59 PM, Eduardo Alonso <
>> eduardoalo...@stratio.com> wrote:
>>
>>> TLDR shouldBe *PD
>>>
>>> Eduardo Alonso
>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>>> 28224 Pozuelo de Alarcón, Madrid
>>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com // 
>>> *@stratiobd
>>> *
>>>
>>> 2017-06-12 10:58 GMT+02:00 Eduardo Alonso :
>>>
 Hi Nandan:

 So, your system must provide these queries:

 1 - SELECT video FROM ... WHERE actor = '...';
 2 - SELECT video FROM ... WHERE producer = '...';
 3 - SELECT video FROM ... WHERE music = '...';
 4 - SELECT video FROM ... WHERE actor = '...' AND producer ='...';
 5 - SELECT video FROM ... WHERE actor = '...' AND music = '...';
 6 - SELECT video WHERE title CONTAINS 'Harry';


 For queries 1-5 you can get them with just cassandra, denormalizing
 tables just the way your mentioned but without solr, just cassandra
 (Indeed, just for equality clauses)

 video_by_actor;
 video_by_producer;
 video_by_music;
 video_by_actor_and_producer;
 video_by_actor_and_music;

 For queries number 6 you need a search engine.

 SOL
 ElasticSearch
 cassandra-lucene-index
 
 SASI
 

 I think, just for your query,  the easiest way to get it is to build a
 SASI index.
 TLDR: I work for stratio in cassandra-lucene-index but for your basic
 query (only one dimension), SASI indexes will work for you.




 Eduardo Alonso
 Vía de las dos Castillas, 33, Ática 4, 3ª Planta
 28224 Pozuelo de Alarcón, Madrid
 Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com // 
 *@stratiobd
 *

 2017-06-12 9:50 GMT+02:00 @Nandan@ :

> But Condition is , I am working with Apache Cassandra Database in
> which I have to store my data into Cassandra and then have to implement
> partial search capability.
> If we need to search based on full search  primary key, then it really
> best and easy to work with Cassandra , but in case of 

Re: Convert single node C* to cluster (rebalancing problem)

2017-06-12 Thread Akhil Mehra
auto_bootstrap is true by default. Ensure its set to true. On startup look
at your logs for your auto_bootstrap value.  Look at the node configuration
line in your log file.

Akhil

On Mon, Jun 12, 2017 at 6:18 PM, Junaid Nasir  wrote:

> No, I didn't set it (left it at default value)
>
> On Fri, Jun 9, 2017 at 3:18 AM, ZAIDI, ASAD A  wrote:
>
>> Did you make sure auto_bootstrap property is indeed set to [true] when
>> you added the node?
>>
>>
>>
>> *From:* Junaid Nasir [mailto:jna...@an10.io]
>> *Sent:* Monday, June 05, 2017 6:29 AM
>> *To:* Akhil Mehra 
>> *Cc:* Vladimir Yudovin ; user@cassandra.apache.org
>> *Subject:* Re: Convert single node C* to cluster (rebalancing problem)
>>
>>
>>
>> not evenly, i have setup a new cluster with subset of data (around 5gb).
>> using the configuration above I am getting these results
>>
>>
>>
>> Datacenter: datacenter1
>>
>> ===
>>
>> Status=Up/Down
>>
>> |/ State=Normal/Leaving/Joining/Moving
>>
>> --  Address  Load   Tokens   Owns (effective)  Host ID Rack
>>
>> UN  10.128.2.1   4.86 GiB   256  44.9% 
>> e4427611-c247-42ee-9404-371e177f5f17  rack1
>>
>> UN  10.128.2.10  725.03 MiB  256 55.1% 
>> 690d5620-99d3-4ae3-aebe-8f33af54a08b  rack1
>>
>> is there anything else I can tweak/check to make the distribution even?
>>
>>
>>
>> On Sat, Jun 3, 2017 at 3:30 AM, Akhil Mehra  wrote:
>>
>> So now the data is evenly balanced in both nodes?
>>
>>
>>
>> Refer to the following documentation to get a better understanding of the
>> roc_address and the broadcast_rpc_address https://
>> www.instaclustr.com/demystifying-cassandras-broadcast_address/
>> .
>> I am surprised that your node started up with rpc_broadcast_address set as
>> this is an unsupported property. I am assuming you are using Cassandra
>> version 3.10.
>>
>>
>>
>>
>>
>> Regards,
>>
>> Akhil
>>
>>
>>
>> On 2/06/2017, at 11:06 PM, Junaid Nasir  wrote:
>>
>>
>>
>> I am able to get it working. I added a new node with following changes
>>
>> #rpc_address:0.0.0.0
>>
>> rpc_address: 10.128.1.11
>>
>> #rpc_broadcast_address:10.128.1.11
>>
>> rpc_address was set to 0.0.0.0, (I ran into a problem previously
>> regarding remote connection and made these changes
>> https://stackoverflow.com/questions/12236898/apache-cassandr
>> a-remote-access
>> 
>> )
>>
>>
>>
>> should it be happening?
>>
>>
>>
>> On Thu, Jun 1, 2017 at 6:31 PM, Vladimir Yudovin 
>> wrote:
>>
>> Did you run "nodetool cleanup" on first node after second was
>> bootstrapped? It should clean rows not belonging to node after tokens
>> changed.
>>
>>
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone
>> 
>> - Cloud Cassandra Hosting*
>>
>>
>>
>>
>>
>>  On Wed, 31 May 2017 03:55:54 -0400 *Junaid Nasir > >* wrote 
>>
>>
>>
>> Cassandra ensure that adding or removing nodes are very easy and that
>> load is balanced between nodes when a change is made. but it's not working
>> in my case.
>>
>> I have a single node C* deployment (with 270 GB of data) and want to load
>> balance the data on multiple nodes, I followed this guide
>> 
>>
>>
>> `nodetool status` shows 2 nodes but load is not balanced between them
>>
>> Datacenter: dc1
>>
>> ===
>>
>> Status=Up/Down
>>
>> |/ State=Normal/Leaving/Joining/Moving
>>
>> --  Address  Load   Tokens   Owns (effective)  Host IDRack
>>
>> UN  10.128.0.7   270.75 GiB  256  48.6%
>> 1a3f6faa-4376-45a8-9c20-11480ae5664c  rack1
>>
>> UN  10.128.0.14  414.36 KiB  256  51.4%
>> 66a89fbf-08ba-4b5d-9f10-55d52a199b41  rack1
>>
>> I also ran 'nodetool repair' on new node but result is same. any pointers
>> would be appreciated :)
>>
>>
>>
>> conf file of new node
>>
>> cluster_name: 'cluster1'

Re: Reg:- Cassandra Data modelling for Search

2017-06-12 Thread @Nandan@
Hi Eduardo,

And As we are trying to build an advanced search functionality in which we
can able to do partial search based on actor, producer, director, etc.
columns.
So if we do denormalization of tables then we have to create tables such as
below :-
video_by_actor
video_by_producer
video_by_director
video_by_date
etc..
By using denormalized, Cassandra only allows us to do equality search, but
for implementing Partial search we need to implement solr on all above
tables.

This is my thinking, but I think this will be not correct way to implement
Apache Solr on all tables.

On Mon, Jun 12, 2017 at 5:11 PM, @Nandan@ 
wrote:

> Hi Edurado,
>
> As you mentioned queries 1-6 ,
> In this condition, we have to proceed with a table like as below :-
> create table videos (
> videoid uuid primary key,
> title text,
> actor text,
> producer text,
> release_date timestamp,
> description text,
> music text,
> etc...
> );
> This table will help to store video datas based on PK videoid and will
> give uniqeness due to uuid.
> But as we know , in one movie there are multiple actor, multiple producer,
> multiple music worked, So how can we store all these.. Only one option will
> left as to use collection type columns.
>
>
> On Mon, Jun 12, 2017 at 4:59 PM, Eduardo Alonso  > wrote:
>
>> TLDR shouldBe *PD
>>
>> Eduardo Alonso
>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>> 28224 Pozuelo de Alarcón, Madrid
>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com // 
>> *@stratiobd
>> *
>>
>> 2017-06-12 10:58 GMT+02:00 Eduardo Alonso :
>>
>>> Hi Nandan:
>>>
>>> So, your system must provide these queries:
>>>
>>> 1 - SELECT video FROM ... WHERE actor = '...';
>>> 2 - SELECT video FROM ... WHERE producer = '...';
>>> 3 - SELECT video FROM ... WHERE music = '...';
>>> 4 - SELECT video FROM ... WHERE actor = '...' AND producer ='...';
>>> 5 - SELECT video FROM ... WHERE actor = '...' AND music = '...';
>>> 6 - SELECT video WHERE title CONTAINS 'Harry';
>>>
>>>
>>> For queries 1-5 you can get them with just cassandra, denormalizing
>>> tables just the way your mentioned but without solr, just cassandra
>>> (Indeed, just for equality clauses)
>>>
>>> video_by_actor;
>>> video_by_producer;
>>> video_by_music;
>>> video_by_actor_and_producer;
>>> video_by_actor_and_music;
>>>
>>> For queries number 6 you need a search engine.
>>>
>>> SOL
>>> ElasticSearch
>>> cassandra-lucene-index
>>> 
>>> SASI
>>> 
>>>
>>> I think, just for your query,  the easiest way to get it is to build a
>>> SASI index.
>>> TLDR: I work for stratio in cassandra-lucene-index but for your basic
>>> query (only one dimension), SASI indexes will work for you.
>>>
>>>
>>>
>>>
>>> Eduardo Alonso
>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>>> 28224 Pozuelo de Alarcón, Madrid
>>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com // 
>>> *@stratiobd
>>> *
>>>
>>> 2017-06-12 9:50 GMT+02:00 @Nandan@ :
>>>
 But Condition is , I am working with Apache Cassandra Database in which
 I have to store my data into Cassandra and then have to implement partial
 search capability.
 If we need to search based on full search  primary key, then it really
 best and easy to work with Cassandra , but in case of flexible search , I
 am getting confused.


 On Mon, Jun 12, 2017 at 3:47 PM, Oskar Kjellin  wrote:

> I haven't run solr with Cassandra myself. I just meant to run
> elasticsearch as a completely separate service and write there as well.
>
> On 12 Jun 2017, at 09:45, @Nandan@ 
> wrote:
>
> Do you mean to use Elastic Search with Cassandra?
> Even I am thinking to use Apache Solr With Cassandra.
> In that case I have to create distributed tables such as:-
> 1) video_by_title, video_by_actor, video_by_year  etc..
> 2) After creating Tables , will have to configure solr core on all
> tables.
>
> Is it like this ?
>
>
>
>
>
> On Mon, Jun 12, 2017 at 3:19 PM, Oskar Kjellin <
> oskar.kjel...@gmail.com> wrote:
>
>> Why not elasticsearch for this use case? It will make your life much
>> simpler
>>
>> > On 12 Jun 2017, at 04:40, @Nandan@ 
>> wrote:
>> >
>> > Hi,
>> >
>> > Currently, I am working on data modeling for Video Company in which
>> we have different types of users as well as different user functionality.
>> > But currently, my concern is about Search video module based on
>> different fields.
>> >
>> > Query patterns are as below:-

Re: Reg:- Cassandra Data modelling for Search

2017-06-12 Thread @Nandan@
Hi Edurado,

As you mentioned queries 1-6 ,
In this condition, we have to proceed with a table like as below :-
create table videos (
videoid uuid primary key,
title text,
actor text,
producer text,
release_date timestamp,
description text,
music text,
etc...
);
This table will help to store video datas based on PK videoid and will give
uniqeness due to uuid.
But as we know , in one movie there are multiple actor, multiple producer,
multiple music worked, So how can we store all these.. Only one option will
left as to use collection type columns.


On Mon, Jun 12, 2017 at 4:59 PM, Eduardo Alonso 
wrote:

> TLDR shouldBe *PD
>
> Eduardo Alonso
> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com // 
> *@stratiobd
> *
>
> 2017-06-12 10:58 GMT+02:00 Eduardo Alonso :
>
>> Hi Nandan:
>>
>> So, your system must provide these queries:
>>
>> 1 - SELECT video FROM ... WHERE actor = '...';
>> 2 - SELECT video FROM ... WHERE producer = '...';
>> 3 - SELECT video FROM ... WHERE music = '...';
>> 4 - SELECT video FROM ... WHERE actor = '...' AND producer ='...';
>> 5 - SELECT video FROM ... WHERE actor = '...' AND music = '...';
>> 6 - SELECT video WHERE title CONTAINS 'Harry';
>>
>>
>> For queries 1-5 you can get them with just cassandra, denormalizing
>> tables just the way your mentioned but without solr, just cassandra
>> (Indeed, just for equality clauses)
>>
>> video_by_actor;
>> video_by_producer;
>> video_by_music;
>> video_by_actor_and_producer;
>> video_by_actor_and_music;
>>
>> For queries number 6 you need a search engine.
>>
>> SOL
>> ElasticSearch
>> cassandra-lucene-index
>> 
>> SASI
>> 
>>
>> I think, just for your query,  the easiest way to get it is to build a
>> SASI index.
>> TLDR: I work for stratio in cassandra-lucene-index but for your basic
>> query (only one dimension), SASI indexes will work for you.
>>
>>
>>
>>
>> Eduardo Alonso
>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>> 28224 Pozuelo de Alarcón, Madrid
>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com // 
>> *@stratiobd
>> *
>>
>> 2017-06-12 9:50 GMT+02:00 @Nandan@ :
>>
>>> But Condition is , I am working with Apache Cassandra Database in which
>>> I have to store my data into Cassandra and then have to implement partial
>>> search capability.
>>> If we need to search based on full search  primary key, then it really
>>> best and easy to work with Cassandra , but in case of flexible search , I
>>> am getting confused.
>>>
>>>
>>> On Mon, Jun 12, 2017 at 3:47 PM, Oskar Kjellin 
>>> wrote:
>>>
 I haven't run solr with Cassandra myself. I just meant to run
 elasticsearch as a completely separate service and write there as well.

 On 12 Jun 2017, at 09:45, @Nandan@ 
 wrote:

 Do you mean to use Elastic Search with Cassandra?
 Even I am thinking to use Apache Solr With Cassandra.
 In that case I have to create distributed tables such as:-
 1) video_by_title, video_by_actor, video_by_year  etc..
 2) After creating Tables , will have to configure solr core on all
 tables.

 Is it like this ?





 On Mon, Jun 12, 2017 at 3:19 PM, Oskar Kjellin  wrote:

> Why not elasticsearch for this use case? It will make your life much
> simpler
>
> > On 12 Jun 2017, at 04:40, @Nandan@ 
> wrote:
> >
> > Hi,
> >
> > Currently, I am working on data modeling for Video Company in which
> we have different types of users as well as different user functionality.
> > But currently, my concern is about Search video module based on
> different fields.
> >
> > Query patterns are as below:-
> > 1) Select video by actor.
> > 2) select video by producer.
> > 3) select video by music.
> > 4) select video by actor and producer.
> > 5) select video by actor and music.
> >
> > Note: - In short, We want to establish an advanced search module by
> which we can search by anyway and get the desired results.
> >
> > During a search , we need partial search also such that if any user
> can search "Harry" title, then we are able to give them result as all
> videos whose
> >  title contains "Harry" at any location.
> >
> > As per my ideas, I have to create separate tables such as
> video_by_actor, video_by_producer etc.. and implement solr query on all
> tables. Otherwise,
> > is there any others way by which we can implement this search 

Re: Reg:- Cassandra Data modelling for Search

2017-06-12 Thread Eduardo Alonso
TLDR shouldBe *PD

Eduardo Alonso
Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd
*

2017-06-12 10:58 GMT+02:00 Eduardo Alonso :

> Hi Nandan:
>
> So, your system must provide these queries:
>
> 1 - SELECT video FROM ... WHERE actor = '...';
> 2 - SELECT video FROM ... WHERE producer = '...';
> 3 - SELECT video FROM ... WHERE music = '...';
> 4 - SELECT video FROM ... WHERE actor = '...' AND producer ='...';
> 5 - SELECT video FROM ... WHERE actor = '...' AND music = '...';
> 6 - SELECT video WHERE title CONTAINS 'Harry';
>
>
> For queries 1-5 you can get them with just cassandra, denormalizing tables
> just the way your mentioned but without solr, just cassandra (Indeed, just
> for equality clauses)
>
> video_by_actor;
> video_by_producer;
> video_by_music;
> video_by_actor_and_producer;
> video_by_actor_and_music;
>
> For queries number 6 you need a search engine.
>
> SOL
> ElasticSearch
> cassandra-lucene-index 
> SASI
> 
>
> I think, just for your query,  the easiest way to get it is to build a
> SASI index.
> TLDR: I work for stratio in cassandra-lucene-index but for your basic
> query (only one dimension), SASI indexes will work for you.
>
>
>
>
> Eduardo Alonso
> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd
> *
>
> 2017-06-12 9:50 GMT+02:00 @Nandan@ :
>
>> But Condition is , I am working with Apache Cassandra Database in which I
>> have to store my data into Cassandra and then have to implement partial
>> search capability.
>> If we need to search based on full search  primary key, then it really
>> best and easy to work with Cassandra , but in case of flexible search , I
>> am getting confused.
>>
>>
>> On Mon, Jun 12, 2017 at 3:47 PM, Oskar Kjellin 
>> wrote:
>>
>>> I haven't run solr with Cassandra myself. I just meant to run
>>> elasticsearch as a completely separate service and write there as well.
>>>
>>> On 12 Jun 2017, at 09:45, @Nandan@ 
>>> wrote:
>>>
>>> Do you mean to use Elastic Search with Cassandra?
>>> Even I am thinking to use Apache Solr With Cassandra.
>>> In that case I have to create distributed tables such as:-
>>> 1) video_by_title, video_by_actor, video_by_year  etc..
>>> 2) After creating Tables , will have to configure solr core on all
>>> tables.
>>>
>>> Is it like this ?
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Jun 12, 2017 at 3:19 PM, Oskar Kjellin 
>>> wrote:
>>>
 Why not elasticsearch for this use case? It will make your life much
 simpler

 > On 12 Jun 2017, at 04:40, @Nandan@ 
 wrote:
 >
 > Hi,
 >
 > Currently, I am working on data modeling for Video Company in which
 we have different types of users as well as different user functionality.
 > But currently, my concern is about Search video module based on
 different fields.
 >
 > Query patterns are as below:-
 > 1) Select video by actor.
 > 2) select video by producer.
 > 3) select video by music.
 > 4) select video by actor and producer.
 > 5) select video by actor and music.
 >
 > Note: - In short, We want to establish an advanced search module by
 which we can search by anyway and get the desired results.
 >
 > During a search , we need partial search also such that if any user
 can search "Harry" title, then we are able to give them result as all
 videos whose
 >  title contains "Harry" at any location.
 >
 > As per my ideas, I have to create separate tables such as
 video_by_actor, video_by_producer etc.. and implement solr query on all
 tables. Otherwise,
 > is there any others way by which we can implement this search module
 effectively.
 >
 > Please suggest.
 >
 > Best regards,

>>>
>>>
>>
>


Re: Reg:- Cassandra Data modelling for Search

2017-06-12 Thread Eduardo Alonso
Hi Nandan:

So, your system must provide these queries:

1 - SELECT video FROM ... WHERE actor = '...';
2 - SELECT video FROM ... WHERE producer = '...';
3 - SELECT video FROM ... WHERE music = '...';
4 - SELECT video FROM ... WHERE actor = '...' AND producer ='...';
5 - SELECT video FROM ... WHERE actor = '...' AND music = '...';
6 - SELECT video WHERE title CONTAINS 'Harry';


For queries 1-5 you can get them with just cassandra, denormalizing tables
just the way your mentioned but without solr, just cassandra (Indeed, just
for equality clauses)

video_by_actor;
video_by_producer;
video_by_music;
video_by_actor_and_producer;
video_by_actor_and_music;

For queries number 6 you need a search engine.

SOL
ElasticSearch
cassandra-lucene-index 
SASI


I think, just for your query,  the easiest way to get it is to build a SASI
index.
TLDR: I work for stratio in cassandra-lucene-index but for your basic query
(only one dimension), SASI indexes will work for you.




Eduardo Alonso
Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd
*

2017-06-12 9:50 GMT+02:00 @Nandan@ :

> But Condition is , I am working with Apache Cassandra Database in which I
> have to store my data into Cassandra and then have to implement partial
> search capability.
> If we need to search based on full search  primary key, then it really
> best and easy to work with Cassandra , but in case of flexible search , I
> am getting confused.
>
>
> On Mon, Jun 12, 2017 at 3:47 PM, Oskar Kjellin 
> wrote:
>
>> I haven't run solr with Cassandra myself. I just meant to run
>> elasticsearch as a completely separate service and write there as well.
>>
>> On 12 Jun 2017, at 09:45, @Nandan@ 
>> wrote:
>>
>> Do you mean to use Elastic Search with Cassandra?
>> Even I am thinking to use Apache Solr With Cassandra.
>> In that case I have to create distributed tables such as:-
>> 1) video_by_title, video_by_actor, video_by_year  etc..
>> 2) After creating Tables , will have to configure solr core on all
>> tables.
>>
>> Is it like this ?
>>
>>
>>
>>
>>
>> On Mon, Jun 12, 2017 at 3:19 PM, Oskar Kjellin 
>> wrote:
>>
>>> Why not elasticsearch for this use case? It will make your life much
>>> simpler
>>>
>>> > On 12 Jun 2017, at 04:40, @Nandan@ 
>>> wrote:
>>> >
>>> > Hi,
>>> >
>>> > Currently, I am working on data modeling for Video Company in which we
>>> have different types of users as well as different user functionality.
>>> > But currently, my concern is about Search video module based on
>>> different fields.
>>> >
>>> > Query patterns are as below:-
>>> > 1) Select video by actor.
>>> > 2) select video by producer.
>>> > 3) select video by music.
>>> > 4) select video by actor and producer.
>>> > 5) select video by actor and music.
>>> >
>>> > Note: - In short, We want to establish an advanced search module by
>>> which we can search by anyway and get the desired results.
>>> >
>>> > During a search , we need partial search also such that if any user
>>> can search "Harry" title, then we are able to give them result as all
>>> videos whose
>>> >  title contains "Harry" at any location.
>>> >
>>> > As per my ideas, I have to create separate tables such as
>>> video_by_actor, video_by_producer etc.. and implement solr query on all
>>> tables. Otherwise,
>>> > is there any others way by which we can implement this search module
>>> effectively.
>>> >
>>> > Please suggest.
>>> >
>>> > Best regards,
>>>
>>
>>
>


Re: Reg:- Cassandra Data modelling for Search

2017-06-12 Thread @Nandan@
But Condition is , I am working with Apache Cassandra Database in which I
have to store my data into Cassandra and then have to implement partial
search capability.
If we need to search based on full search  primary key, then it really best
and easy to work with Cassandra , but in case of flexible search , I am
getting confused.


On Mon, Jun 12, 2017 at 3:47 PM, Oskar Kjellin 
wrote:

> I haven't run solr with Cassandra myself. I just meant to run
> elasticsearch as a completely separate service and write there as well.
>
> On 12 Jun 2017, at 09:45, @Nandan@  wrote:
>
> Do you mean to use Elastic Search with Cassandra?
> Even I am thinking to use Apache Solr With Cassandra.
> In that case I have to create distributed tables such as:-
> 1) video_by_title, video_by_actor, video_by_year  etc..
> 2) After creating Tables , will have to configure solr core on all tables.
>
> Is it like this ?
>
>
>
>
>
> On Mon, Jun 12, 2017 at 3:19 PM, Oskar Kjellin 
> wrote:
>
>> Why not elasticsearch for this use case? It will make your life much
>> simpler
>>
>> > On 12 Jun 2017, at 04:40, @Nandan@ 
>> wrote:
>> >
>> > Hi,
>> >
>> > Currently, I am working on data modeling for Video Company in which we
>> have different types of users as well as different user functionality.
>> > But currently, my concern is about Search video module based on
>> different fields.
>> >
>> > Query patterns are as below:-
>> > 1) Select video by actor.
>> > 2) select video by producer.
>> > 3) select video by music.
>> > 4) select video by actor and producer.
>> > 5) select video by actor and music.
>> >
>> > Note: - In short, We want to establish an advanced search module by
>> which we can search by anyway and get the desired results.
>> >
>> > During a search , we need partial search also such that if any user can
>> search "Harry" title, then we are able to give them result as all videos
>> whose
>> >  title contains "Harry" at any location.
>> >
>> > As per my ideas, I have to create separate tables such as
>> video_by_actor, video_by_producer etc.. and implement solr query on all
>> tables. Otherwise,
>> > is there any others way by which we can implement this search module
>> effectively.
>> >
>> > Please suggest.
>> >
>> > Best regards,
>>
>
>


Re: Reg:- Cassandra Data modelling for Search

2017-06-12 Thread Oskar Kjellin
I haven't run solr with Cassandra myself. I just meant to run elasticsearch as 
a completely separate service and write there as well. 

> On 12 Jun 2017, at 09:45, @Nandan@  wrote:
> 
> Do you mean to use Elastic Search with Cassandra?
> Even I am thinking to use Apache Solr With Cassandra. 
> In that case I have to create distributed tables such as:-
> 1) video_by_title, video_by_actor, video_by_year  etc..
> 2) After creating Tables , will have to configure solr core on all tables. 
> 
> Is it like this ?
> 
> 
> 
>  
> 
>> On Mon, Jun 12, 2017 at 3:19 PM, Oskar Kjellin  
>> wrote:
>> Why not elasticsearch for this use case? It will make your life much simpler
>> 
>> > On 12 Jun 2017, at 04:40, @Nandan@  wrote:
>> >
>> > Hi,
>> >
>> > Currently, I am working on data modeling for Video Company in which we 
>> > have different types of users as well as different user functionality.
>> > But currently, my concern is about Search video module based on different 
>> > fields.
>> >
>> > Query patterns are as below:-
>> > 1) Select video by actor.
>> > 2) select video by producer.
>> > 3) select video by music.
>> > 4) select video by actor and producer.
>> > 5) select video by actor and music.
>> >
>> > Note: - In short, We want to establish an advanced search module by which 
>> > we can search by anyway and get the desired results.
>> >
>> > During a search , we need partial search also such that if any user can 
>> > search "Harry" title, then we are able to give them result as all videos 
>> > whose
>> >  title contains "Harry" at any location.
>> >
>> > As per my ideas, I have to create separate tables such as video_by_actor, 
>> > video_by_producer etc.. and implement solr query on all tables. Otherwise,
>> > is there any others way by which we can implement this search module 
>> > effectively.
>> >
>> > Please suggest.
>> >
>> > Best regards,
> 


Re: Reg:- Cassandra Data modelling for Search

2017-06-12 Thread @Nandan@
Do you mean to use Elastic Search with Cassandra?
Even I am thinking to use Apache Solr With Cassandra.
In that case I have to create distributed tables such as:-
1) video_by_title, video_by_actor, video_by_year  etc..
2) After creating Tables , will have to configure solr core on all tables.

Is it like this ?





On Mon, Jun 12, 2017 at 3:19 PM, Oskar Kjellin 
wrote:

> Why not elasticsearch for this use case? It will make your life much
> simpler
>
> > On 12 Jun 2017, at 04:40, @Nandan@ 
> wrote:
> >
> > Hi,
> >
> > Currently, I am working on data modeling for Video Company in which we
> have different types of users as well as different user functionality.
> > But currently, my concern is about Search video module based on
> different fields.
> >
> > Query patterns are as below:-
> > 1) Select video by actor.
> > 2) select video by producer.
> > 3) select video by music.
> > 4) select video by actor and producer.
> > 5) select video by actor and music.
> >
> > Note: - In short, We want to establish an advanced search module by
> which we can search by anyway and get the desired results.
> >
> > During a search , we need partial search also such that if any user can
> search "Harry" title, then we are able to give them result as all videos
> whose
> >  title contains "Harry" at any location.
> >
> > As per my ideas, I have to create separate tables such as
> video_by_actor, video_by_producer etc.. and implement solr query on all
> tables. Otherwise,
> > is there any others way by which we can implement this search module
> effectively.
> >
> > Please suggest.
> >
> > Best regards,
>


Re: Reg:- Cassandra Data modelling for Search

2017-06-12 Thread Oskar Kjellin
Why not elasticsearch for this use case? It will make your life much simpler 

> On 12 Jun 2017, at 04:40, @Nandan@  wrote:
> 
> Hi, 
> 
> Currently, I am working on data modeling for Video Company in which we have 
> different types of users as well as different user functionality. 
> But currently, my concern is about Search video module based on different 
> fields. 
> 
> Query patterns are as below:-
> 1) Select video by actor.
> 2) select video by producer.
> 3) select video by music.
> 4) select video by actor and producer. 
> 5) select video by actor and music. 
> 
> Note: - In short, We want to establish an advanced search module by which we 
> can search by anyway and get the desired results. 
> 
> During a search , we need partial search also such that if any user can 
> search "Harry" title, then we are able to give them result as all videos whose
>  title contains "Harry" at any location. 
> 
> As per my ideas, I have to create separate tables such as video_by_actor, 
> video_by_producer etc.. and implement solr query on all tables. Otherwise,
> is there any others way by which we can implement this search module 
> effectively. 
> 
> Please suggest.
> 
> Best regards,

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Using Cassandra for my usecase

2017-06-12 Thread Oskar Kjellin
You could put the tenant as a column that is part of the clustering key. That 
avoids large partitions. 

On 12 Jun 2017, at 07:14, Erick Ramirez  wrote:

>> Given my use case is cassandra the best suited one or is there any other 
>> database which suits my requirement better?
> 
> Probably not the right forum for that question. It's like walking into a Ford 
> dealership and asking if the Mustang is the best car for you. 
> 
> In any case, you would choose Cassandra because you require:
> - high availability
> - very fast reads
> - no single-point-of-failure
> - no downtime
> - you have a scale problem
> - etc
> 
>> What would be best way to implement multi-tenancy?
> 
> The "best" way is what works for your use case based on testing you've done. 
> As you already are aware in the example you provided, adding a column as the 
> tenant indicator could lead to large partitions so you need to be careful 
> about how you model your data.
> 
> Some implementations completely side-step this by distributing tenants across 
> keyspaces but that may not suit your needs.
> 
>> Given that I need to query by multiple dimensions would denormalized tables 
>> work better or should I be using materialized views?
> 
> With denormalised tables, your application needs to implement the logic for 
> batching the updates together.
> 
> With materialised views, that complexity is managed for you by C* but you 
> need to be aware of the performance impact associated with it. For example 
> with RF=3 on the base table, MV adds another RF=3 for an additional table so 
> RF=3+3. A second MV increases RF=3+3+3 and so on.
> 
>> Anything else that I need to consider based on your experiences with 
>> cassandra?
> 
> 
> Multi-tenancy can be difficult particularly for complex use cases. Test, test 
> and test. And make sure you always correctly size your cluster with enough 
> nodes.
> 
> You need to limit the number of tables to about 200 at the most (regardless 
> of the number of keyspaces). Having too many tables puts pressure on the heap 
> of each node.
> 
> Good luck!
> 
>> On Sun, Jun 11, 2017 at 2:07 AM, Govindarajan Srinivasaraghavan 
>>  wrote:
>> Hi All,
>> 
>> Just to give a background I'm working on a project where I need to store 
>> fast incoming time series data and have rest api's to query and serve the 
>> data to users when needed. The data as such is a single JSON which is 1kb in 
>> size and the data has to be purged after a specific time period (say few 
>> weeks or months). The incoming rate would be approximately 100k messages per 
>> second and the biggest challenge is the data should be query-able by 
>> multiple dimensions with sorting, paging and data dump options. 
>> 
>> I started looking into database options and felt like cassandra might be a 
>> good choice for my use case since the requirement needs faster writes. In 
>> order to query by multiple dimensions I had to insert the same record into 
>> multiple denormalized tables (around 8 tables). Now I need to implement 
>> multitenancy and having an extra column in the partition key to query by 
>> tenant will not work since there will be some tenants with huge amounts of 
>> data compared to the rest. My other option is to have the tenant identifier 
>> appended to the table names so that I can perform per teannt queries easily. 
>> 
>> Here are my questions for which I need some help.
>> - Given my use case is cassandra the best suited one or is there any other 
>> database which suits my requirement better?
>> - What would be best way to implement multi-tenancy?
>> - Given that I need to query by multiple dimensions would denormalized 
>> tables work better or should I be using materialized views?
>> - Anything else that I need to consider based on your experiences with 
>> cassandra?
>> 
>> Thanks
> 


Re: Convert single node C* to cluster (rebalancing problem)

2017-06-12 Thread Junaid Nasir
No, I didn't set it (left it at default value)

On Fri, Jun 9, 2017 at 3:18 AM, ZAIDI, ASAD A  wrote:

> Did you make sure auto_bootstrap property is indeed set to [true] when
> you added the node?
>
>
>
> *From:* Junaid Nasir [mailto:jna...@an10.io]
> *Sent:* Monday, June 05, 2017 6:29 AM
> *To:* Akhil Mehra 
> *Cc:* Vladimir Yudovin ; user@cassandra.apache.org
> *Subject:* Re: Convert single node C* to cluster (rebalancing problem)
>
>
>
> not evenly, i have setup a new cluster with subset of data (around 5gb).
> using the configuration above I am getting these results
>
>
>
> Datacenter: datacenter1
>
> ===
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  Address  Load   Tokens   Owns (effective)  Host ID Rack
>
> UN  10.128.2.1   4.86 GiB   256  44.9% 
> e4427611-c247-42ee-9404-371e177f5f17  rack1
>
> UN  10.128.2.10  725.03 MiB  256 55.1% 
> 690d5620-99d3-4ae3-aebe-8f33af54a08b  rack1
>
> is there anything else I can tweak/check to make the distribution even?
>
>
>
> On Sat, Jun 3, 2017 at 3:30 AM, Akhil Mehra  wrote:
>
> So now the data is evenly balanced in both nodes?
>
>
>
> Refer to the following documentation to get a better understanding of the
> roc_address and the broadcast_rpc_address https://www.instaclustr.com/
> demystifying-cassandras-broadcast_address/
> .
> I am surprised that your node started up with rpc_broadcast_address set as
> this is an unsupported property. I am assuming you are using Cassandra
> version 3.10.
>
>
>
>
>
> Regards,
>
> Akhil
>
>
>
> On 2/06/2017, at 11:06 PM, Junaid Nasir  wrote:
>
>
>
> I am able to get it working. I added a new node with following changes
>
> #rpc_address:0.0.0.0
>
> rpc_address: 10.128.1.11
>
> #rpc_broadcast_address:10.128.1.11
>
> rpc_address was set to 0.0.0.0, (I ran into a problem previously regarding
> remote connection and made these changes https://stackoverflow.com/
> questions/12236898/apache-cassandra-remote-access
> 
> )
>
>
>
> should it be happening?
>
>
>
> On Thu, Jun 1, 2017 at 6:31 PM, Vladimir Yudovin 
> wrote:
>
> Did you run "nodetool cleanup" on first node after second was
> bootstrapped? It should clean rows not belonging to node after tokens
> changed.
>
>
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone
> 
> - Cloud Cassandra Hosting*
>
>
>
>
>
>  On Wed, 31 May 2017 03:55:54 -0400 *Junaid Nasir  >* wrote 
>
>
>
> Cassandra ensure that adding or removing nodes are very easy and that load
> is balanced between nodes when a change is made. but it's not working in my
> case.
>
> I have a single node C* deployment (with 270 GB of data) and want to load
> balance the data on multiple nodes, I followed this guide
> 
>
>
> `nodetool status` shows 2 nodes but load is not balanced between them
>
> Datacenter: dc1
>
> ===
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  Address  Load   Tokens   Owns (effective)  Host IDRack
>
> UN  10.128.0.7   270.75 GiB  256  48.6%
> 1a3f6faa-4376-45a8-9c20-11480ae5664c  rack1
>
> UN  10.128.0.14  414.36 KiB  256  51.4%
> 66a89fbf-08ba-4b5d-9f10-55d52a199b41  rack1
>
> I also ran 'nodetool repair' on new node but result is same. any pointers
> would be appreciated :)
>
>
>
> conf file of new node
>
> cluster_name: 'cluster1'
>
>  - seeds: "10.128.0.7"
> num_tokens: 256
>
> endpoint_snitch: GossipingPropertyFileSnitch
>
> Thanks,
>
> Junaid
>
>
>
>
>
>
>
>
>