Re: About the relationship between the sstable compaction and the read path

2019-01-09 Thread Jinhua Luo
> We stop at the memtable if we know that’s all we need. This depends on a lot > of factors (schema, point read vs slice, etc) The codes seems to search sstables without checking whether the query is already satisfied in memtable only. Could you point out the related code snippets for what you

Re: [EXTERNAL] Re: Good way of configuring Apache spark with Apache Cassandra

2019-01-09 Thread Dor Laor
On Wed, Jan 9, 2019 at 7:28 AM Durity, Sean R wrote: > I think you could consider option C: Create a (new) analytics DC in > Cassandra and run your spark nodes there. Then you can address the scaling > just on that DC. You can also use less vnodes, only replicate certain > keyspaces, etc. in

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-09 Thread Jonathan Haddad
> I’m still not sure if having tombstones vs. empty values / frozen UDTs will have the same results. When in doubt, benchmark. Good luck, Jon On Wed, Jan 9, 2019 at 3:02 PM Tomas Bartalos wrote: > Loosing atomic updates is a good point, but in my use case its not a > problem, since I always

Re: Cassandra and Apache Arrow

2019-01-09 Thread Jonathan Haddad
Not sure why they put that in there, it's definitely misleading. There's nothing arrow related in Cassandra. There's an open JIRA, but nothing has been committed yet: https://issues.apache.org/jira/browse/CASSANDRA-9259 On Wed, Jan 9, 2019 at 3:48 PM Tomas Bartalos wrote: > There is a diagram

Re: Cassandra and Apache Arrow

2019-01-09 Thread Tomas Bartalos
There is a diagram on the homepage displaying Cassandra (with other storages) as source of data. https://arrow.apache.org/img/shared.png Which made me think there should be some integration... On Thu, 10 Jan 2019, 12:38 am Jonathan Haddad Where are you seeing that it works with Cassandra?

Re: Cassandra and Apache Arrow

2019-01-09 Thread Jonathan Haddad
Where are you seeing that it works with Cassandra? There's no mention of it under https://arrow.apache.org/powered_by/, and on the homepage it says only says that a Cassandra developer worked on it. We (unfortunately) don't do anything with it at the moment. On Wed, Jan 9, 2019 at 3:24 PM Tomas

Cassandra and Apache Arrow

2019-01-09 Thread Tomas Bartalos
I’ve read lot of nice things about Apache Arrow in-memory columnar format. On their homepage they mention Cassandra as a possible storage which could interoperate with Arrow. Unfortunately I was not able to find any working example which would demonstrate their cooperation. My use case: I’m

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-09 Thread Tomas Bartalos
Loosing atomic updates is a good point, but in my use case its not a problem, since I always overwrite the whole record (no partitial updates). I’m still not sure if having tombstones vs. empty values / frozen UDTs will have the same results. When I update one row with 10 null columns it will

Re: [EXTERNAL] Re: Good way of configuring Apache spark with Apache Cassandra

2019-01-09 Thread Goutham reddy
Thanks Sean. But what if I want to have both Spark and elasticsearch with Cassandra as separare data center. Does that cause any overhead ? On Wed, Jan 9, 2019 at 7:28 AM Durity, Sean R wrote: > I think you could consider option C: Create a (new) analytics DC in > Cassandra and run your spark

RE: [EXTERNAL] Re: Good way of configuring Apache spark with Apache Cassandra

2019-01-09 Thread Durity, Sean R
I think you could consider option C: Create a (new) analytics DC in Cassandra and run your spark nodes there. Then you can address the scaling just on that DC. You can also use less vnodes, only replicate certain keyspaces, etc. in order to perform the analytics more efficiently. Sean Durity

Re: About the relationship between the sstable compaction and the read path

2019-01-09 Thread Jeff Jirsa
You’re comparing single machine key/value stores to a distributed db with a much richer data model (partitions/slices, statics, range reads, range deletions, etc). They’re going to read very differently. Instead of explaining why they’re not like rocks/ldb, how about you tell us what you’re

Re: How seed nodes are working and how to upgrade/replace them?

2019-01-09 Thread Jonathan Ballet
On Tue, 8 Jan 2019 at 18:29, Jeff Jirsa wrote: > Given Consul's popularity, seems like someone could make an argument that > we should be shipping a consul-aware seed provider. > Elasticsearch has a very handy dedicated file-based discovery system:

Re: How seed nodes are working and how to upgrade/replace them?

2019-01-09 Thread Jonathan Ballet
On Tue, 8 Jan 2019 at 18:39, Jeff Jirsa wrote: > On Tue, Jan 8, 2019 at 8:19 AM Jonathan Ballet wrote: > >> Hi Jeff, >> >> thanks for answering to most of my points! >> From the reloadseeds' ticket, I followed to >> https://issues.apache.org/jira/browse/CASSANDRA-3829 which was very >>