When performing an update, the following needs to happen:
1. Read document.last_modified
2. Get the current timestamp
3. Update document with last_modified=current timestamp
4. Insert into doc_by_last_modified with last_modified=current timestamp
5. Delete from doc_by_last_modified with
Concurrent update should not be problematic. Duplicate entries should not
be created. If it appears to be, explain your apparent issue so we can see
whether it is a real issue.
But at least from all of the details you have disclosed so far, there does
not appear to be any indication that this
Carlos’ suggestion (nor yours) didn’t didn’t provide a way to query
recently-modified documents.
His updated suggestion provides a way to get recently-modified documents, but
not ordered.
On Jul 22, 2015, at 4:19 PM, Jack Krupansky
jack.krupan...@gmail.commailto:jack.krupan...@gmail.com
Maybe you could explain in more detail what you mean by recently modified
documents, since that is precisely what I thought I suggested with
descending ordering.
-- Jack Krupansky
On Thu, Jul 23, 2015 at 3:40 PM, Robert Wille rwi...@fold3.com wrote:
Carlos’ suggestion (nor yours) didn’t
I obviously worded my original email poorly. I guess that’s what happens when
you post at the end of the day just before quitting.
I want to get a list of documents, ordered from most-recently modified to
least-recently modified, with each document appearing exactly once.
Jack, your schema
Ah, so you your access pattern is to get all documents modified in a
particular date, right?
Then I think your approach is good, and to avoid duplication, why don't add
the docId as the first clustering column and remove the last_modified field
from it?
That way, your primary key would be PRIMARY
No way to query recently-modified documents.
I don't follow why you say that. I mean, that was the point of the data
model suggestion I proposed. Maybe you could clarify.
I also wanted to mention that the new materialized view feature of
Cassandra 3.0 might handle this use case, including taking
@cassandra.apache.org
Subject: Re: Schema questions for data structures with recently-modified access
patterns
No way to query recently-modified documents.
I don't follow why you say that. I mean, that was the point of the data model
suggestion I proposed. Maybe you could clarify.
I also wanted to mention
The time series doesn’t provide the access pattern I’m looking for. No way to
query recently-modified documents.
On Jul 21, 2015, at 9:13 AM, Carlos Alonso
i...@mrcalonso.commailto:i...@mrcalonso.com wrote:
Hi Robert,
What about modelling it as a time serie?
CREATE TABLE document (
docId
Hi Robert,
What about modelling it as a time serie?
CREATE TABLE document (
docId UUID,
doc TEXT,
last_modified TIMESTAMP
PRIMARY KEY(docId, last_modified)
) WITH CLUSTERING ORDER BY (last_modified DESC);
This way, you the lastest modification will always be the first record in
the row,
Keep the original document base table, but then the query table should have
the PK as last_modified, docId, with last_modified descending, so that a
query can get the n most recently modified documents.
Yes, you still need to manually delete the old entry for the document in
the query table if
If last_modified is a clustering column, it needs a partitioning column, which
is what date is for (although I should have named it day, and I also forgot to
add the order by desc clause). This is essentially what I came up with. Still
not liking how easy it is to get duplicates.
On Jul 21,
I'm relatively new to data modeling in Cassandra, but perhaps instead of
date and last_modified in your primary key for doc_by_last_modified, just
use the docId. This way, you are can update the last_modified and date
fields against the docId and it removes the duplicate issue and obviates
the
13 matches
Mail list logo