Carlos’ suggestion (nor yours) didn’t didn’t provide a way to query 
recently-modified documents.

His updated suggestion provides a way to get recently-modified documents, but 
not ordered.

On Jul 22, 2015, at 4:19 PM, Jack Krupansky 
<jack.krupan...@gmail.com<mailto:jack.krupan...@gmail.com>> wrote:

"No way to query recently-modified documents."

I don't follow why you say that. I mean, that was the point of the data model 
suggestion I proposed. Maybe you could clarify.

I also wanted to mention that the new materialized view feature of Cassandra 
3.0 might handle this use case, including taking care of the delete, 
automatically.


-- Jack Krupansky

On Tue, Jul 21, 2015 at 12:37 PM, Robert Wille 
<rwi...@fold3.com<mailto:rwi...@fold3.com>> wrote:
The time series doesn’t provide the access pattern I’m looking for. No way to 
query recently-modified documents.

On Jul 21, 2015, at 9:13 AM, Carlos Alonso 
<i...@mrcalonso.com<mailto:i...@mrcalonso.com>> wrote:

Hi Robert,

What about modelling it as a time serie?

CREATE TABLE document (
  docId UUID,
  doc TEXT,
  last_modified TIMESTAMP
  PRIMARY KEY(docId, last_modified)
) WITH CLUSTERING ORDER BY (last_modified DESC);

This way, you the lastest modification will always be the first record in the 
row, therefore accessing it should be as easy as:

SELECT * FROM document WHERE docId == <the docId> LIMIT 1;

And, if you experience diskspace issues due to very long rows, then you can 
always expire old ones using TTL or on a batch job. Tombstones will never be a 
problem in this case as, due to the specified clustering order, the latest 
modification will always be first record in the row.

Hope it helps.

Carlos Alonso | Software Engineer | @calonso<https://twitter.com/calonso>

On 21 July 2015 at 05:59, Robert Wille 
<rwi...@fold3.com<mailto:rwi...@fold3.com>> wrote:
Data structures that have a recently-modified access pattern seem to be a poor 
fit for Cassandra. I’m wondering if any of you smart guys can provide 
suggestions.

For the sake of discussion, lets assume I have the following tables:

CREATE TABLE document (
        docId UUID,
        doc TEXT,
        last_modified TIMEUUID,
        PRIMARY KEY ((docid))
)

CREATE TABLE doc_by_last_modified (
        date TEXT,
        last_modified TIMEUUID,
        docId UUID,
        PRIMARY KEY ((date), last_modified)
)

When I update a document, I retrieve its last_modified time, delete the current 
record from doc_by_last_modified, and add a new one. Unfortunately, if you’d 
like each document to appear at most once in the doc_by_last_modified table, 
then this doesn’t work so well.

Documents can get into the doc_by_last_modified table multiple times if there 
is concurrent access, or if there is a consistency issue.

Any thoughts out there on how to efficiently provide recently-modified access 
to a table? This problem exists for many types of data structures, not just 
recently-modified. Any ordered data structure that can be dynamically reordered 
suffers from the same problems. As I’ve been doing schema design, this pattern 
keeps recurring. A nice way to address this problem has lots of applications.

Thanks in advance for your thoughts

Robert





Reply via email to