Re: sasi index question (read timeout on many selects)

2017-02-17 Thread Benjamin Roth
Btw:

They break incremental repair if you use CDC: https://issues.apache.
org/jira/browse/CASSANDRA-12888


Not only when using CDC! You shouldn't use incremental repairs with MVs.
Never (right now).

2017-02-16 17:42 GMT+01:00 Jonathan Haddad :

> My advice to avoid them is based on the issues that have been filed in
> Jira.  Benjamin Roth is one of the only people talking about his MV usage,
> and has filed a few JIRAs discussing their problems when bootstrapping new
> nodes, as well as issues repairing.
>
> https://issues.apache.org/jira/browse/CASSANDRA-12730?
> jql=project%20%3D%20CASSANDRA%20and%20reporter%20%3D%
> 20brstgt%20and%20text%20~%20%22materialized%22
>
> They also can't be altered: https://issues.apache.org/jira/browse/
> CASSANDRA-9736
>
> They may be less performant than managing the data yourself:
> https://issues.apache.org/jira/browse/CASSANDRA-10295, https://
> issues.apache.org/jira/browse/CASSANDRA-10307
>
> They're not as flexible as your own tables: https://issues.apache.
> org/jira/browse/CASSANDRA-9928, https://issues.apache.org/
> jira/browse/CASSANDRA-11194, https://issues.apache.org/jira/
> browse/CASSANDRA-12463
>
> They break incremental repair if you use CDC: https://issues.apache.
> org/jira/browse/CASSANDRA-12888
>
> I don't know why DataStax advises using them.  Perhaps ask them?
>
> Jon
>
> On Thu, Feb 16, 2017 at 7:57 AM Micha  wrote:
>
>>
>>
>> On 16.02.2017 16:33, Jonathan Haddad wrote:
>> >
>> > Regarding MVs, do not use the ones that shipped with 3.x.  They're not
>> > ready for production.  Manage it yourself by using a second table and
>> > inserting a second record there.
>> >
>>
>> Out of interest... there is a slight discrepance between the advice not
>> to use mv and the docu about the feature on the datastax side. Or do I
>> have to use another cassandra version (instead of 3.9)?
>>
>>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: sasi index question (read timeout on many selects)

2017-02-16 Thread Jonathan Haddad
My advice to avoid them is based on the issues that have been filed in
Jira.  Benjamin Roth is one of the only people talking about his MV usage,
and has filed a few JIRAs discussing their problems when bootstrapping new
nodes, as well as issues repairing.

https://issues.apache.org/jira/browse/CASSANDRA-12730?jql=project%20%3D%20CASSANDRA%20and%20reporter%20%3D%20brstgt%20and%20text%20~%20%22materialized%22

They also can't be altered:
https://issues.apache.org/jira/browse/CASSANDRA-9736

They may be less performant than managing the data yourself:
https://issues.apache.org/jira/browse/CASSANDRA-10295,
https://issues.apache.org/jira/browse/CASSANDRA-10307

They're not as flexible as your own tables:
https://issues.apache.org/jira/browse/CASSANDRA-9928,
https://issues.apache.org/jira/browse/CASSANDRA-11194,
https://issues.apache.org/jira/browse/CASSANDRA-12463

They break incremental repair if you use CDC:
https://issues.apache.org/jira/browse/CASSANDRA-12888

I don't know why DataStax advises using them.  Perhaps ask them?

Jon

On Thu, Feb 16, 2017 at 7:57 AM Micha  wrote:

>
>
> On 16.02.2017 16:33, Jonathan Haddad wrote:
> >
> > Regarding MVs, do not use the ones that shipped with 3.x.  They're not
> > ready for production.  Manage it yourself by using a second table and
> > inserting a second record there.
> >
>
> Out of interest... there is a slight discrepance between the advice not
> to use mv and the docu about the feature on the datastax side. Or do I
> have to use another cassandra version (instead of 3.9)?
>
>


Re: sasi index question (read timeout on many selects)

2017-02-16 Thread Micha


On 16.02.2017 16:33, Jonathan Haddad wrote:
> 
> Regarding MVs, do not use the ones that shipped with 3.x.  They're not
> ready for production.  Manage it yourself by using a second table and
> inserting a second record there.
> 

Out of interest... there is a slight discrepance between the advice not
to use mv and the docu about the feature on the datastax side. Or do I
have to use another cassandra version (instead of 3.9)?



Re: sasi index question (read timeout on many selects)

2017-02-16 Thread Micha


On 16.02.2017 16:33, Jonathan Haddad wrote:
> I agree w/ DuyHai regarding the index.  The use case described here is a
> terrible one for SASI indexes.
> 
> Regarding MVs, do not use the ones that shipped with 3.x.  They're not
> ready for production.  Manage it yourself by using a second table and
> inserting a second record there.


yes, thanks for pointing this out.

 Michael



Re: sasi index question (read timeout on many selects)

2017-02-16 Thread Jonathan Haddad
I agree w/ DuyHai regarding the index.  The use case described here is a
terrible one for SASI indexes.

Regarding MVs, do not use the ones that shipped with 3.x.  They're not
ready for production.  Manage it yourself by using a second table and
inserting a second record there.

On Thu, Feb 16, 2017 at 7:06 AM DuyHai Doan  wrote:

> Using MV and put id as partition key is your best bet right now. SASI will
> be too expensive for this simple use case
>
> On Thu, Feb 16, 2017 at 3:21 PM, Micha  wrote:
>
>
>
> it's like having a table (sha256 blob primary key, id timeuuid, data1
> text, ., )
>
> So both, sha256 and id are unique.
> I would like to query *either* with sha256 *or* with id.
>
> I thought this can be done with a sasi index, but it has to be done with
> a second table (manual way) or with a mv with id as partition key.
>
> On 16.02.2017 15:11, Benjamin Roth wrote:
> > No matter what has to be indexed here, the preferrable way is most
> > probably denormalization instead of another index.
>
> it's rather manual inserting the data with another partition key or make
> a mv for with the other key.
>
>
>


Re: sasi index question (read timeout on many selects)

2017-02-16 Thread DuyHai Doan
Using MV and put id as partition key is your best bet right now. SASI will
be too expensive for this simple use case

On Thu, Feb 16, 2017 at 3:21 PM, Micha  wrote:

>
>
> it's like having a table (sha256 blob primary key, id timeuuid, data1
> text, ., )
>
> So both, sha256 and id are unique.
> I would like to query *either* with sha256 *or* with id.
>
> I thought this can be done with a sasi index, but it has to be done with
> a second table (manual way) or with a mv with id as partition key.
>
> On 16.02.2017 15:11, Benjamin Roth wrote:
> > No matter what has to be indexed here, the preferrable way is most
> > probably denormalization instead of another index.
>
> it's rather manual inserting the data with another partition key or make
> a mv for with the other key.
>
>


Re: sasi index question (read timeout on many selects)

2017-02-16 Thread Micha


it's like having a table (sha256 blob primary key, id timeuuid, data1
text, ., )

So both, sha256 and id are unique.
I would like to query *either* with sha256 *or* with id.

I thought this can be done with a sasi index, but it has to be done with
a second table (manual way) or with a mv with id as partition key.

On 16.02.2017 15:11, Benjamin Roth wrote:
> No matter what has to be indexed here, the preferrable way is most
> probably denormalization instead of another index.

it's rather manual inserting the data with another partition key or make
a mv for with the other key.



Re: sasi index question (read timeout on many selects)

2017-02-16 Thread DuyHai Doan
[image: Inline image 1]

On Thu, Feb 16, 2017 at 3:08 PM, Micha  wrote:

>
>
> On 16.02.2017 14:30, DuyHai Doan wrote:
> > Why indexing BLOB data ? It does not make any sense
>
> My partition key is a secure hash sum,  I don't index a blob.
>
>
>
>
>


Re: sasi index question (read timeout on many selects)

2017-02-16 Thread Benjamin Roth
No matter what has to be indexed here, the preferrable way is most probably
denormalization instead of another index.

2017-02-16 15:09 GMT+01:00 DuyHai Doan :

> [image: Inline image 1]
>
> On Thu, Feb 16, 2017 at 3:08 PM, Micha  wrote:
>
>>
>>
>> On 16.02.2017 14:30, DuyHai Doan wrote:
>> > Why indexing BLOB data ? It does not make any sense
>>
>> My partition key is a secure hash sum,  I don't index a blob.
>>
>>
>>
>>
>>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: sasi index question (read timeout on many selects)

2017-02-16 Thread Micha


On 16.02.2017 14:30, DuyHai Doan wrote:
> Why indexing BLOB data ? It does not make any sense

My partition key is a secure hash sum,  I don't index a blob.






Re: sasi index question (read timeout on many selects)

2017-02-16 Thread DuyHai Doan
Why indexing BLOB data ? It does not make any sense

"I thought sasi index is globally held, in contrast to the normal secondary
index.." --> Who said that ? It's just wrong

On Thu, Feb 16, 2017 at 1:50 PM, Micha  wrote:

> Hi,
>
>
> my table has (among others) three columns, which are unique blobs.
> So I made the first column the partition key and created two sasi
> indices for the two other columns.
>
> After inserting ca 90m records I'm not able to query a bunch of rows
> (sending 1 selects to the cluster) using only a sasi index. After a
> few seconds I get timeouts.
>
> I have read the documents about the sasi index but I don't get why this
> happens. Is this because I don't include the partition key in the query?
>
> I thought sasi index is globally held, in contrast to the normal
> secondary index..
>
>
> thanks for helping,
>  Michael
>
>