"No way to query recently-modified documents."

I don't follow why you say that. I mean, that was the point of the data
model suggestion I proposed. Maybe you could clarify.

I also wanted to mention that the new materialized view feature of
Cassandra 3.0 might handle this use case, including taking care of the
delete, automatically.


-- Jack Krupansky

On Tue, Jul 21, 2015 at 12:37 PM, Robert Wille <rwi...@fold3.com> wrote:

>  The time series doesn’t provide the access pattern I’m looking for. No
> way to query recently-modified documents.
>
>  On Jul 21, 2015, at 9:13 AM, Carlos Alonso <i...@mrcalonso.com> wrote:
>
>  Hi Robert,
>
>  What about modelling it as a time serie?
>
>  CREATE TABLE document (
>   docId UUID,
>   doc TEXT,
>   last_modified TIMESTAMP
>   PRIMARY KEY(docId, last_modified)
> ) WITH CLUSTERING ORDER BY (last_modified DESC);
>
>  This way, you the lastest modification will always be the first record
> in the row, therefore accessing it should be as easy as:
>
>  SELECT * FROM document WHERE docId == <the docId> LIMIT 1;
>
>  And, if you experience diskspace issues due to very long rows, then you
> can always expire old ones using TTL or on a batch job. Tombstones will
> never be a problem in this case as, due to the specified clustering order,
> the latest modification will always be first record in the row.
>
>  Hope it helps.
>
>  Carlos Alonso | Software Engineer | @calonso
> <https://twitter.com/calonso>
>
> On 21 July 2015 at 05:59, Robert Wille <rwi...@fold3.com> wrote:
>
>> Data structures that have a recently-modified access pattern seem to be a
>> poor fit for Cassandra. I’m wondering if any of you smart guys can provide
>> suggestions.
>>
>> For the sake of discussion, lets assume I have the following tables:
>>
>> CREATE TABLE document (
>>         docId UUID,
>>         doc TEXT,
>>         last_modified TIMEUUID,
>>         PRIMARY KEY ((docid))
>> )
>>
>> CREATE TABLE doc_by_last_modified (
>>         date TEXT,
>>         last_modified TIMEUUID,
>>         docId UUID,
>>         PRIMARY KEY ((date), last_modified)
>> )
>>
>> When I update a document, I retrieve its last_modified time, delete the
>> current record from doc_by_last_modified, and add a new one. Unfortunately,
>> if you’d like each document to appear at most once in the
>> doc_by_last_modified table, then this doesn’t work so well.
>>
>> Documents can get into the doc_by_last_modified table multiple times if
>> there is concurrent access, or if there is a consistency issue.
>>
>> Any thoughts out there on how to efficiently provide recently-modified
>> access to a table? This problem exists for many types of data structures,
>> not just recently-modified. Any ordered data structure that can be
>> dynamically reordered suffers from the same problems. As I’ve been doing
>> schema design, this pattern keeps recurring. A nice way to address this
>> problem has lots of applications.
>>
>> Thanks in advance for your thoughts
>>
>> Robert
>>
>>
>
>

Reply via email to