RE: Mechanism to Bulk Export from Cassandra on daily Basis

2020-02-21 Thread JOHN, BIBIN
CDC from Cassandra works using Oracle Goldengate for Bigdata, we are doing that 
and publishing to kafka. But one of the downstream need batch files with 
complete dataset.
I am evaluating some options based on previous responses.

Thanks
Bibin John

From: Peter Corless 
Sent: Friday, February 21, 2020 2:15 PM
To: user@cassandra.apache.org
Subject: Re: Mechanism to Bulk Export from Cassandra on daily Basis

Question: would daily deltas be a good use of CDC? (Rather than export entire 
tables.)

(I can understand that this might make analytics hard if you need to span 
multiple resultant daily files.)

Perhaps along with CDC, maybe set up the tables for export via a Kafka topic?

(https://docs.lenses.io/connectors/source/cassandra.html)

Or maybe some sort of exporter using Apache Spark?

https://github.com/scylladb/scylla-migrator

I'm just trying to throw out a few other ideas on how to solve the exportation 
problem.

On Fri, Feb 21, 2020, 8:45 AM Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> wrote:
I would also push for something besides a full refresh, if at all possible. It 
feels like a waste of resources to me – and not predictably scalable. 
Suggestions: use a queue to send writes to both systems. If the downstream 
system doesn’t handle TTL, perhaps set an expiration date and a purge query on 
the downstream target.

If you have to do the full refresh, perhaps a Spark job would be a decent 
solution. I would probably create a separate DC (with a lower replication 
factor and smaller number of nodes) just to handle the analytical/unload kind 
of workload (if the other functions of the cluster might be impacted by the 
unload).

DSBulk from DataStax is very fast and scriptable, too.

Sean Durity – Staff Systems Engineer, Cassandra

From: JOHN, BIBIN mailto:bj9...@att.com>>
Sent: Wednesday, February 19, 2020 5:25 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] RE: Mechanism to Bulk Export from Cassandra on daily Basis

Thank you for suggestion. Full refresh is currently designed because with delta 
we cannot identify what got deleted. So downstreams prefer full data everyday.


Thanks
Bibin John

From: Reid Pinchback 
mailto:rpinchb...@tripadvisor.com>>
Sent: Wednesday, February 19, 2020 3:14 PM
To: user@cassandra.apache.org
Subject: Re: Mechanism to Bulk Export from Cassandra on daily Basis

To the question of ‘best approach’, so far the comments have been about 
alternatives in tools.

Another axis you might want to consider is from the data model viewpoint.  So, 
for example, let’s say you have 600M rows.  You want to do a daily transfer of 
data for some reason.  First question that comes to mind is, do you need all 
the data every day?  Usually that would only be the case if all of the data is 
at risk of changing.

Generally the way I’d cut down the pain on something like this is to figure out 
if the data model currently does, or could be made to, only mutate in a limited 
subset.  Then maybe all you are transferring are the daily changes.  Systems 
based on catching up to daily changes will usually be pulling single-digit 
percentages of data volume compared to the entire storage footprint.  That’s 
not only a lot less data to pull, it’s also a lot less impact on the ongoing 
operations of the cluster while you are pulling that data.

R

From: "JOHN, BIBIN" mailto:bj9...@att.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Wednesday, February 19, 2020 at 1:13 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Mechanism to Bulk Export from Cassandra on daily Basis

Message from External Sender
Team,
We have a requirement to bulk export data from Cassandra on daily basis? Table 
contain close to 600M records and cluster is having 12 nodes. What is the best 
approach to do this?


Thanks
Bibin John



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in 

Re: Mechanism to Bulk Export from Cassandra on daily Basis

2020-02-21 Thread Peter Corless
Question: would daily deltas be a good use of CDC? (Rather than export
entire tables.)

(I can understand that this might make analytics hard if you need to span
multiple resultant daily files.)

Perhaps along with CDC, maybe set up the tables for export via a Kafka
topic?

(https://docs.lenses.io/connectors/source/cassandra.html)

Or maybe some sort of exporter using Apache Spark?

https://github.com/scylladb/scylla-migrator

I'm just trying to throw out a few other ideas on how to solve the
exportation problem.

On Fri, Feb 21, 2020, 8:45 AM Durity, Sean R 
wrote:

> I would also push for something besides a full refresh, if at all
> possible. It feels like a waste of resources to me – and not predictably
> scalable. Suggestions: use a queue to send writes to both systems. If the
> downstream system doesn’t handle TTL, perhaps set an expiration date and a
> purge query on the downstream target.
>
>
>
> If you have to do the full refresh, perhaps a Spark job would be a decent
> solution. I would probably create a separate DC (with a lower replication
> factor and smaller number of nodes) just to handle the analytical/unload
> kind of workload (if the other functions of the cluster might be impacted
> by the unload).
>
>
>
> DSBulk from DataStax is very fast and scriptable, too.
>
>
>
> Sean Durity – Staff Systems Engineer, Cassandra
>
>
>
> *From:* JOHN, BIBIN 
> *Sent:* Wednesday, February 19, 2020 5:25 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] RE: Mechanism to Bulk Export from Cassandra on
> daily Basis
>
>
>
> Thank you for suggestion. Full refresh is currently designed because with
> delta we cannot identify what got deleted. So downstreams prefer full data
> everyday.
>
>
>
>
>
> Thanks
>
> Bibin John
>
>
>
> *From:* Reid Pinchback 
> *Sent:* Wednesday, February 19, 2020 3:14 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Mechanism to Bulk Export from Cassandra on daily Basis
>
>
>
> To the question of ‘best approach’, so far the comments have been about
> alternatives in tools.
>
>
>
> Another axis you might want to consider is from the data model viewpoint.
> So, for example, let’s say you have 600M rows.  You want to do a daily
> transfer of data for some reason.  First question that comes to mind is, do
> you need all the data every day?  Usually that would only be the case if
> all of the data is at risk of changing.
>
>
>
> Generally the way I’d cut down the pain on something like this is to
> figure out if the data model currently does, or could be made to, only
> mutate in a limited subset.  Then maybe all you are transferring are the
> daily changes.  Systems based on catching up to daily changes will usually
> be pulling single-digit percentages of data volume compared to the entire
> storage footprint.  That’s not only a lot less data to pull, it’s also a
> lot less impact on the ongoing operations of the cluster while you are
> pulling that data.
>
>
> R
>
>
>
> *From: *"JOHN, BIBIN" 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Wednesday, February 19, 2020 at 1:13 PM
> *To: *"user@cassandra.apache.org" 
> *Subject: *Mechanism to Bulk Export from Cassandra on daily Basis
>
>
>
> *Message from External Sender*
>
> Team,
>
> We have a requirement to bulk export data from Cassandra on daily basis?
> Table contain close to 600M records and cluster is having 12 nodes. What is
> the best approach to do this?
>
>
>
>
>
> Thanks
>
> Bibin John
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>


RE: [EXTERNAL] Re: IN OPERATOR VS BATCH QUERY

2020-02-21 Thread Durity, Sean R
Batches are for atomicity, not performance.

I would do single deletes with a prepared statement. An IN clause causes extra 
work for the coordinator because multiple partitions are being impacted. So, 
the coordinator has to coordinate all nodes involved in those writes (up to the 
whole cluster). Availability and performance are compromised for multiple 
partition operations. I do not allow them.

Also – TTL at insert (or update) is a much better solution than large purge 
strategies. As someone who spent a month wrangling hundreds of billions of 
deletes, I am an ardent preacher of TTL during design time.

Sean Durity

From: Attila Wind 
Sent: Friday, February 21, 2020 2:52 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: IN OPERATOR VS BATCH QUERY

Hi Sergio,

AFAIK you use batches when you want to get "all or nothing" approach from 
Cassandra. So turning multiple statements into one atomic operation.

One very typical use case for this is when you have denormalized data in 
multiple tables (optimized for different queries) but you need to modify all of 
them the same way as they were just one entity.

This means that if any ofyour delete statements would fail for whatever reason 
then all of your delete statements would be rolled back.

I think you dont want that overhead here for sure...

We are not there yet with our development but we will need similar "cleanup" 
functionality soon.
I was also thinking about the IN operator for similar cases but I am curious if 
anyone here has better idea...
Why does the IN operator blowing up the coordinator? I do not entirely get it...

Thanks
Attila

Sergio mailto:lapostadiser...@gmail.com>> ezt írta 
(időpont: 2020. febr. 21., P 3:44):
The current approach is delete from key_value where id = whatever and it is 
performed asynchronously from the client.
I was thinking to reduce at least the network round-trips between client  and 
coordinator with that Batch approach. :)

In any case, I would test it it will improve or not. So when do you use batch 
then?

Best,

Sergio

On Thu, Feb 20, 2020, 6:18 PM Erick Ramirez 
mailto:erick.rami...@datastax.com>> wrote:
Batches aren't really meant for optimisation in the same way as RDBMS. If 
anything, it will just put pressure on the coordinator having to fire off 
multiple requests to lots of replicas. The IN operator falls into the same 
category and I personally wouldn't use it with more than 2 or 3 partitions 
because then the coordinator will suffer from the same problem.

If it were me, I'd just issue single-partition deletes and throttle it to a 
"reasonable" throughput that your cluster can handle. The word "reasonable" is 
in quotes because only you can determine that magic number for your cluster 
through testing. Cheers!



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: [EXTERNAL] Re: Null values in sasi indexed column

2020-02-21 Thread Durity, Sean R
I would consider building a lookup table instead. Something like:
Create table new_lookup (
   new-lookup-partition text,
   existing-key text
   PRIMARY KEY (new-lookup-partition)
)

For me, these are easier to understand and reason through for Cassandra 
performance and availability. I would use this approach for up to 3 or 4 
different lookup patterns. If it got to be more than that, I would be using DSE 
Search/SOLR.

Just be warned, I have seen teams asking for these kind of options just because 
they are guessing the access patterns they want. If they cannot identify their 
access patterns, I encourage them to use other technologies. Otherwise the pain 
will be great.


Sean Durity

From: Erick Ramirez 
Sent: Wednesday, February 19, 2020 6:58 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Null values in sasi indexed column

Rahul, in my opinion SASI is an experimental feature and isn't ready for 
primetime yet. It has some advantages over secondary indexes but if it were me, 
I'd stick with native secondary indexes. But test, test and test so you can 
make an informed decision on what works for your use case. Cheers!


Erick Ramirez  |  Developer Relations

erick.rami...@datastax.com | datastax.com 
[datastax.com]
[https://lh4.googleusercontent.com/GivrE4j_1bWvQnZP67Zpa5eomhEeKON-X6kFljLxDatL7QPL_aineBJzM_rXzrqNQkENnZt7KyXLROlLTHuMM3OFNlZ8IrW-adjXKRiD7ojG6OjjFoLio3HbKwVwXt7_Qna02H8I][linkedin.com]
 
[https://lh6.googleusercontent.com/0juOULc-Qhs6qzVY5mN0tzIMZ4w17Jv2fiV5xboewGBH0SFiEwV0uPTO_W5OwGr0jCOXmoJLBq1aNLsr1oChLMgJNvNt1e4bHxO2KJUK-iagQ4jw9SiuTMmpktVSfygdLS_vQe6v]
 
[facebook.com]
 
[https://lh5.googleusercontent.com/IdGeRVBWRf50wPOny50XQ3O0rtkebOh8e2D9DCanVuy-f3a-wpI3PpQJnGtVFL5aHPOwm4hsginvqhQfTXnP_XT_8fuQWS6Mt0KKRFkRANDhS22T3UiXpGfBkMHJxy48ZQJFaXsZ]
 
[twitter.com]
 
[https://lh4.googleusercontent.com/PbPMGIQsTltjGio5a_e7dp35l6ysZMG_ib69EUHmIvbXHXzRkrNKNMfMR8uwSS1AAoQaG6xn96PH-L1wLQE8FBLSjN_g10Q8y0n1Tp5SYtKO3L1JDN_T73HgSSQJayqn7YMTFXn-]
 
[feeds.feedburner.com]
 
[https://lh5.googleusercontent.com/Rnk5QTWTovfX-z1uRr0FQjt17VnMURyI8rDCi4rTJUY5lnX-QevuQWTFa39GS9fJCMqP0SXSkCLtKf064p0-59f80PmA2hZRqGRFFOlZlbJzXv2EevvdbeKYFq4s9g5zzP54KKQB]
 
[github.com]

[https://datastax.com/sites/default/files/content/graphics/email/datastax-email-signature-2019.jpg][datastax.com]



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: Mechanism to Bulk Export from Cassandra on daily Basis

2020-02-21 Thread Durity, Sean R
I would also push for something besides a full refresh, if at all possible. It 
feels like a waste of resources to me – and not predictably scalable. 
Suggestions: use a queue to send writes to both systems. If the downstream 
system doesn’t handle TTL, perhaps set an expiration date and a purge query on 
the downstream target.

If you have to do the full refresh, perhaps a Spark job would be a decent 
solution. I would probably create a separate DC (with a lower replication 
factor and smaller number of nodes) just to handle the analytical/unload kind 
of workload (if the other functions of the cluster might be impacted by the 
unload).

DSBulk from DataStax is very fast and scriptable, too.

Sean Durity – Staff Systems Engineer, Cassandra

From: JOHN, BIBIN 
Sent: Wednesday, February 19, 2020 5:25 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] RE: Mechanism to Bulk Export from Cassandra on daily Basis

Thank you for suggestion. Full refresh is currently designed because with delta 
we cannot identify what got deleted. So downstreams prefer full data everyday.


Thanks
Bibin John

From: Reid Pinchback 
mailto:rpinchb...@tripadvisor.com>>
Sent: Wednesday, February 19, 2020 3:14 PM
To: user@cassandra.apache.org
Subject: Re: Mechanism to Bulk Export from Cassandra on daily Basis

To the question of ‘best approach’, so far the comments have been about 
alternatives in tools.

Another axis you might want to consider is from the data model viewpoint.  So, 
for example, let’s say you have 600M rows.  You want to do a daily transfer of 
data for some reason.  First question that comes to mind is, do you need all 
the data every day?  Usually that would only be the case if all of the data is 
at risk of changing.

Generally the way I’d cut down the pain on something like this is to figure out 
if the data model currently does, or could be made to, only mutate in a limited 
subset.  Then maybe all you are transferring are the daily changes.  Systems 
based on catching up to daily changes will usually be pulling single-digit 
percentages of data volume compared to the entire storage footprint.  That’s 
not only a lot less data to pull, it’s also a lot less impact on the ongoing 
operations of the cluster while you are pulling that data.

R

From: "JOHN, BIBIN" mailto:bj9...@att.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Wednesday, February 19, 2020 at 1:13 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Mechanism to Bulk Export from Cassandra on daily Basis

Message from External Sender
Team,
We have a requirement to bulk export data from Cassandra on daily basis? Table 
contain close to 600M records and cluster is having 12 nodes. What is the best 
approach to do this?


Thanks
Bibin John



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


Re: [RELEASE] Apache Cassandra 3.11.6 released

2020-02-21 Thread Michael Shuler




On 2/21/20 10:28 AM, Michael Shuler wrote:

So a little extra background:
The ASF INFRA team made a change last week to the required release 
location and redirected all requests for the previous required release 
URL to:

   https://downloads.apache.org/cassandra//
   (previous: https://www.apache.org/dist/cassandra// )

This is where projects are required to release artifacts for it to be 
called a release.


The Cassandra project was duplicating the upload of the same tar.gz 
release artifacts to 2 locations - the above location and buried down in 
a maven repository. I understand from Mick, who is beginning to help 
with release management, that the maven upload is undesired. The maven 
tar.gz upload was to facilitate having a place to download the 
hashed/signed artifacts after a release vote (so we don't rely on some 
other non-proper location like someone's laptop or on people.a.o) and 
upload it to the proper release URL above. I don't think it was ever 
intended to be a public user's download location. We have never written 
release email announcements or put this location on the project download 
page.


The 4.0-alpha3 release, as well as the latest round of stable branch 
releases did not upload the tar.gz artifacts to maven. That tar 
shuffling was done in a dist/dev svn repository with some release build 
script changes to better fit policy, and we dropped the maven tarball 
upload, since it is redundant and not a proper release location, per ASF 
release policy. The release publishing doc from the ASF is the main 
thing we've been working from:

   https://apache.org/dev/release-publishing

Last week the Cassandra project updated the main URLs that we put on our 
website from https://www.apache.org/dist/cassandra/ to 
https://downloads.apache.org/cassandra/ per the recommendation of ASF 
INFRA. (It was being redirected anyway.)


For the foreseeable future, the publicly announced location to download 
the latest releases will be: https://www.apache.org/dist/cassandra/


dammit.. scratch that.. this should say:

For the foreseeable future, the publicly announced location to download 
the latest releases will be: https://downloads.apache.org/cassandra/


Michael

As always, if you ever need old releases (only the latest are on the 
main release URL, per policy), the archive URL would be where to find 
those: https://archive.apache.org/dist/cassandra/


I hope that helps! If there is something I got wrong, please let me/us 
know. There are a boatload of ASF rules and documents, and every time I 
go looking for specific info, there are a multitude of doc locations 
that reference specific things, like releases in this case. There are 
links to other docs in the above, such as a maven-specific publish doc.


Man, I'm sorry we pulled the rug out on these - they weren't intended 
for public consumption, as far as I know. I could be completely wrong on 
that, but I believe we were "doing it wrong" as a project, once Mick 
started digging in and helping out.


My best advice is follow what is on the project download page and in the 
release announcement emails. These may change things from time to time.


Kind regards,
Michael

On 2/21/20 9:10 AM, Chad Helms wrote:
Well, that's where they've all been before.  All our automation is 
pulling them from there.  Can you point me to the details of this 
change you mentioned, so we can look at what changes we'll need to 
make and additional thing's we'll have to proxy behind our firewall.


All the 4.0-alpha releases (1 & 2) are also there, btw.

On 2/21/20, 9:02 AM, "Michael Shuler"  wrote:

 Why?
 This release adjusted the location of the tar artifacts, so they 
were

 published to the normal recommended dist/release location and not
 included in maven, where I understand they are not desired to be 
there.

 Kind regards,
 Michael
 On 2/21/20 8:18 AM, Chad Helms wrote:
 > Can we get "apache-cassandra:3.11.6:bin.tar.gz" artifact 
published to maven central too, please?

 >
 > On 2/14/20, 5:28 PM, "Michael Shuler"  wrote:
 >
 >  The Cassandra team is pleased to announce the release of 
Apache

 >  Cassandra version 3.11.6.
 >
 >  Apache Cassandra is a fully distributed database. It is 
the right choice
 >  when you need scalability and high availability without 
compromising

 >  performance.
 >
 >    http://cassandra.apache.org/
 >
 >  Downloads of source and binary distributions are listed in 
our download

 >  section:
 >
 >    http://cassandra.apache.org/download/
 >
 >  This version is a bug fix release[1] on the 3.11 series. 
As always,
 >  please pay attention to the release notes[2] and Let us 
know[3] if you

 >  were to encounter any problem.
 >
 >  Enjoy!
 >
 >  [1]: CHANGES.txt
 >  

Re: [RELEASE] Apache Cassandra 3.11.6 released

2020-02-21 Thread Michael Shuler

So a little extra background:
The ASF INFRA team made a change last week to the required release 
location and redirected all requests for the previous required release 
URL to:

  https://downloads.apache.org/cassandra//
  (previous: https://www.apache.org/dist/cassandra// )

This is where projects are required to release artifacts for it to be 
called a release.


The Cassandra project was duplicating the upload of the same tar.gz 
release artifacts to 2 locations - the above location and buried down in 
a maven repository. I understand from Mick, who is beginning to help 
with release management, that the maven upload is undesired. The maven 
tar.gz upload was to facilitate having a place to download the 
hashed/signed artifacts after a release vote (so we don't rely on some 
other non-proper location like someone's laptop or on people.a.o) and 
upload it to the proper release URL above. I don't think it was ever 
intended to be a public user's download location. We have never written 
release email announcements or put this location on the project download 
page.


The 4.0-alpha3 release, as well as the latest round of stable branch 
releases did not upload the tar.gz artifacts to maven. That tar 
shuffling was done in a dist/dev svn repository with some release build 
script changes to better fit policy, and we dropped the maven tarball 
upload, since it is redundant and not a proper release location, per ASF 
release policy. The release publishing doc from the ASF is the main 
thing we've been working from:

  https://apache.org/dev/release-publishing

Last week the Cassandra project updated the main URLs that we put on our 
website from https://www.apache.org/dist/cassandra/ to 
https://downloads.apache.org/cassandra/ per the recommendation of ASF 
INFRA. (It was being redirected anyway.)


For the foreseeable future, the publicly announced location to download 
the latest releases will be: https://www.apache.org/dist/cassandra/


As always, if you ever need old releases (only the latest are on the 
main release URL, per policy), the archive URL would be where to find 
those: https://archive.apache.org/dist/cassandra/


I hope that helps! If there is something I got wrong, please let me/us 
know. There are a boatload of ASF rules and documents, and every time I 
go looking for specific info, there are a multitude of doc locations 
that reference specific things, like releases in this case. There are 
links to other docs in the above, such as a maven-specific publish doc.


Man, I'm sorry we pulled the rug out on these - they weren't intended 
for public consumption, as far as I know. I could be completely wrong on 
that, but I believe we were "doing it wrong" as a project, once Mick 
started digging in and helping out.


My best advice is follow what is on the project download page and in the 
release announcement emails. These may change things from time to time.


Kind regards,
Michael

On 2/21/20 9:10 AM, Chad Helms wrote:

Well, that's where they've all been before.  All our automation is pulling them 
from there.  Can you point me to the details of this change you mentioned, so 
we can look at what changes we'll need to make and additional thing's we'll 
have to proxy behind our firewall.

All the 4.0-alpha releases (1 & 2) are also there, btw.

On 2/21/20, 9:02 AM, "Michael Shuler"  wrote:

 Why?
 
 This release adjusted the location of the tar artifacts, so they were

 published to the normal recommended dist/release location and not
 included in maven, where I understand they are not desired to be there.
 
 Kind regards,

 Michael
 
 On 2/21/20 8:18 AM, Chad Helms wrote:

 > Can we get "apache-cassandra:3.11.6:bin.tar.gz" artifact published to 
maven central too, please?
 >
 > On 2/14/20, 5:28 PM, "Michael Shuler"  wrote:
 >
 >  The Cassandra team is pleased to announce the release of Apache
 >  Cassandra version 3.11.6.
 >
 >  Apache Cassandra is a fully distributed database. It is the right 
choice
 >  when you need scalability and high availability without compromising
 >  performance.
 >
 >http://cassandra.apache.org/
 >
 >  Downloads of source and binary distributions are listed in our 
download
 >  section:
 >
 >http://cassandra.apache.org/download/
 >
 >  This version is a bug fix release[1] on the 3.11 series. As always,
 >  please pay attention to the release notes[2] and Let us know[3] if 
you
 >  were to encounter any problem.
 >
 >  Enjoy!
 >
 >  [1]: CHANGES.txt
 >  
https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-3.11.6
 >  [2]: NEWS.txt
 >  
https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/cassandra-3.11.6
 >  [3]: 

Re: [RELEASE] Apache Cassandra 3.11.6 released

2020-02-21 Thread Chad Helms
Well, that's where they've all been before.  All our automation is pulling them 
from there.  Can you point me to the details of this change you mentioned, so 
we can look at what changes we'll need to make and additional thing's we'll 
have to proxy behind our firewall.

All the 4.0-alpha releases (1 & 2) are also there, btw.

On 2/21/20, 9:02 AM, "Michael Shuler"  wrote:

Why?

This release adjusted the location of the tar artifacts, so they were 
published to the normal recommended dist/release location and not 
included in maven, where I understand they are not desired to be there.

Kind regards,
Michael

On 2/21/20 8:18 AM, Chad Helms wrote:
> Can we get "apache-cassandra:3.11.6:bin.tar.gz" artifact published to 
maven central too, please?
> 
> On 2/14/20, 5:28 PM, "Michael Shuler"  wrote:
> 
>  The Cassandra team is pleased to announce the release of Apache
>  Cassandra version 3.11.6.
>  
>  Apache Cassandra is a fully distributed database. It is the right 
choice
>  when you need scalability and high availability without compromising
>  performance.
>  
>http://cassandra.apache.org/
>  
>  Downloads of source and binary distributions are listed in our 
download
>  section:
>  
>http://cassandra.apache.org/download/
>  
>  This version is a bug fix release[1] on the 3.11 series. As always,
>  please pay attention to the release notes[2] and Let us know[3] if 
you
>  were to encounter any problem.
>  
>  Enjoy!
>  
>  [1]: CHANGES.txt
>  
https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-3.11.6
>  [2]: NEWS.txt
>  
https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/cassandra-3.11.6
>  [3]: https://issues.apache.org/jira/browse/CASSANDRA
>  
>  -
>  To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>  For additional commands, e-mail: dev-h...@cassandra.apache.org
>  
>  
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org





Re: [RELEASE] Apache Cassandra 3.11.6 released

2020-02-21 Thread Michael Shuler

Why?

This release adjusted the location of the tar artifacts, so they were 
published to the normal recommended dist/release location and not 
included in maven, where I understand they are not desired to be there.


Kind regards,
Michael

On 2/21/20 8:18 AM, Chad Helms wrote:

Can we get "apache-cassandra:3.11.6:bin.tar.gz" artifact published to maven 
central too, please?

On 2/14/20, 5:28 PM, "Michael Shuler"  wrote:

 The Cassandra team is pleased to announce the release of Apache
 Cassandra version 3.11.6.
 
 Apache Cassandra is a fully distributed database. It is the right choice

 when you need scalability and high availability without compromising
 performance.
 
   http://cassandra.apache.org/
 
 Downloads of source and binary distributions are listed in our download

 section:
 
   http://cassandra.apache.org/download/
 
 This version is a bug fix release[1] on the 3.11 series. As always,

 please pay attention to the release notes[2] and Let us know[3] if you
 were to encounter any problem.
 
 Enjoy!
 
 [1]: CHANGES.txt

 
https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-3.11.6
 [2]: NEWS.txt
 
https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/cassandra-3.11.6
 [3]: https://issues.apache.org/jira/browse/CASSANDRA
 
 -

 To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
 For additional commands, e-mail: dev-h...@cassandra.apache.org
 
 



-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: [RELEASE] Apache Cassandra 3.11.6 released

2020-02-21 Thread Chad Helms
Can we get "apache-cassandra:3.11.6:bin.tar.gz" artifact published to maven 
central too, please?

On 2/14/20, 5:28 PM, "Michael Shuler"  wrote:

The Cassandra team is pleased to announce the release of Apache 
Cassandra version 3.11.6.

Apache Cassandra is a fully distributed database. It is the right choice 
when you need scalability and high availability without compromising 
performance.

  http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download 
section:

  http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.11 series. As always, 
please pay attention to the release notes[2] and Let us know[3] if you 
were to encounter any problem.

Enjoy!

[1]: CHANGES.txt 

https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-3.11.6
[2]: NEWS.txt 

https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/cassandra-3.11.6
[3]: https://issues.apache.org/jira/browse/CASSANDRA

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org





Re: How to get two PreparedStatement objects for the same query string

2020-02-21 Thread Oleksandr Shulgin
On Fri, Feb 21, 2020 at 2:12 PM Deepak Sharma
 wrote:

>
> We have a use case where we need to have two separate PreparedStatement
> objects (one with RetryPolicy and the other without any retry policy) for
> the same query string. And when we try to create two separate
> PreparedStatements, we see only one PreparedStatement getting retained (the
> previous PreparedStatement gets overridden by the last PreparedStatement).
> We looked at the code implementation and see that method
> Cluster.addPrepared is looking at some kind of MD5 hash for the query and
> it overrides the old PreparedStatement with the new PreparedStatement.
>
> To overcome this behavior we are creating two separate queries (the second
> query has an extra space at the end compared to first query) to distinguish
> them so that we end up getting two prepared statements. This hack works but
> we are too comfortable with this. Is there a better way to achieve this?
>
> More details in the following link with code:
>
> https://stackoverflow.com/questions/60198114/how-to-get-two-preparedstatement-objects-for-the-same-query-string
>
> Let me know if you have any questions.
>

Have you considered using clone() (inherited from java.lang.Object) on the
object returned by session.prepare() and then setting the policies
differently on both copies?

--
Alex


How to get two PreparedStatement objects for the same query string

2020-02-21 Thread Deepak Sharma
Hi There,

We have a use case where we need to have two separate PreparedStatement
objects (one with RetryPolicy and the other without any retry policy) for
the same query string. And when we try to create two separate
PreparedStatements, we see only one PreparedStatement getting retained (the
previous PreparedStatement gets overridden by the last PreparedStatement).
We looked at the code implementation and see that method Cluster.addPrepared is
looking at some kind of MD5 hash for the query and it overrides the old
PreparedStatement with the new PreparedStatement.

To overcome this behavior we are creating two separate queries (the second
query has an extra space at the end compared to first query) to distinguish
them so that we end up getting two prepared statements. This hack works but
we are too comfortable with this. Is there a better way to achieve this?

More details in the following link with code:
https://stackoverflow.com/questions/60198114/how-to-get-two-preparedstatement-objects-for-the-same-query-string

Let me know if you have any questions.

Thanks,
Deepak