Re: Schema questions for data structures with recently-modified access patterns

2015-07-24 Thread Robert Wille
When performing an update, the following needs to happen: 1. Read document.last_modified 2. Get the current timestamp 3. Update document with last_modified=current timestamp 4. Insert into doc_by_last_modified with last_modified=current timestamp 5. Delete from doc_by_last_modified with

Re: Schema questions for data structures with recently-modified access patterns

2015-07-23 Thread Jack Krupansky
Concurrent update should not be problematic. Duplicate entries should not be created. If it appears to be, explain your apparent issue so we can see whether it is a real issue. But at least from all of the details you have disclosed so far, there does not appear to be any indication that this

Re: Schema questions for data structures with recently-modified access patterns

2015-07-23 Thread Robert Wille
Carlos’ suggestion (nor yours) didn’t didn’t provide a way to query recently-modified documents. His updated suggestion provides a way to get recently-modified documents, but not ordered. On Jul 22, 2015, at 4:19 PM, Jack Krupansky jack.krupan...@gmail.commailto:jack.krupan...@gmail.com

Re: Schema questions for data structures with recently-modified access patterns

2015-07-23 Thread Jack Krupansky
Maybe you could explain in more detail what you mean by recently modified documents, since that is precisely what I thought I suggested with descending ordering. -- Jack Krupansky On Thu, Jul 23, 2015 at 3:40 PM, Robert Wille rwi...@fold3.com wrote: Carlos’ suggestion (nor yours) didn’t

Re: Schema questions for data structures with recently-modified access patterns

2015-07-23 Thread Robert Wille
I obviously worded my original email poorly. I guess that’s what happens when you post at the end of the day just before quitting. I want to get a list of documents, ordered from most-recently modified to least-recently modified, with each document appearing exactly once. Jack, your schema

Re: Schema questions for data structures with recently-modified access patterns

2015-07-22 Thread Carlos Alonso
Ah, so you your access pattern is to get all documents modified in a particular date, right? Then I think your approach is good, and to avoid duplication, why don't add the docId as the first clustering column and remove the last_modified field from it? That way, your primary key would be PRIMARY

Re: Schema questions for data structures with recently-modified access patterns

2015-07-22 Thread Jack Krupansky
No way to query recently-modified documents. I don't follow why you say that. I mean, that was the point of the data model suggestion I proposed. Maybe you could clarify. I also wanted to mention that the new materialized view feature of Cassandra 3.0 might handle this use case, including taking

RE: Schema questions for data structures with recently-modified access patterns

2015-07-22 Thread Alec Collier
@cassandra.apache.org Subject: Re: Schema questions for data structures with recently-modified access patterns No way to query recently-modified documents. I don't follow why you say that. I mean, that was the point of the data model suggestion I proposed. Maybe you could clarify. I also wanted to mention

Re: Schema questions for data structures with recently-modified access patterns

2015-07-21 Thread Robert Wille
The time series doesn’t provide the access pattern I’m looking for. No way to query recently-modified documents. On Jul 21, 2015, at 9:13 AM, Carlos Alonso i...@mrcalonso.commailto:i...@mrcalonso.com wrote: Hi Robert, What about modelling it as a time serie? CREATE TABLE document ( docId

Re: Schema questions for data structures with recently-modified access patterns

2015-07-21 Thread Carlos Alonso
Hi Robert, What about modelling it as a time serie? CREATE TABLE document ( docId UUID, doc TEXT, last_modified TIMESTAMP PRIMARY KEY(docId, last_modified) ) WITH CLUSTERING ORDER BY (last_modified DESC); This way, you the lastest modification will always be the first record in the row,

Re: Schema questions for data structures with recently-modified access patterns

2015-07-21 Thread Jack Krupansky
Keep the original document base table, but then the query table should have the PK as last_modified, docId, with last_modified descending, so that a query can get the n most recently modified documents. Yes, you still need to manually delete the old entry for the document in the query table if

Re: Schema questions for data structures with recently-modified access patterns

2015-07-21 Thread Robert Wille
If last_modified is a clustering column, it needs a partitioning column, which is what date is for (although I should have named it day, and I also forgot to add the order by desc clause). This is essentially what I came up with. Still not liking how easy it is to get duplicates. On Jul 21,

Re: Schema questions for data structures with recently-modified access patterns

2015-07-21 Thread Victor
I'm relatively new to data modeling in Cassandra, but perhaps instead of date and last_modified in your primary key for doc_by_last_modified, just use the docId. This way, you are can update the last_modified and date fields against the docId and it removes the duplicate issue and obviates the