[
https://issues.apache.org/jira/browse/CASSANDRA-13981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482488#comment-16482488
]
Jason Brown commented on CASSANDRA-13981:
-----------------------------------------
Thanks, [~pree] and [[email protected]], for the patches. I've been
reading them, understanding the scope of the technology, and see the direction
you are going. However, I'd like to propose a slightly different direction.
Stepping back, the pcj library is divided into two parts: the higher-level pcj
components (as used in the version of this patch as previously posted), and the
lower-level API, called LLPL in the library. LLPL is much smaller than the pcj
parts, and offers a direct and simple way to just write bytes into a backing
array from the persistent memory. In my option this will be far more natural
for the cassandra community and developers, and provides a more direct access
to the storage bytes. We already have lots of serialization code, and we
understand that quite well; thus I'd like to keep leveraging that lower-level
thinking. We will need to write custom, non-generic data structures (like we
already have for our LSM-based engine), but I only see this as complete win. We
need to optimize, in every way we reasonably can, our data structures as we are
a database, after all. LLPL has some rough edges wrt code optimization and we
will want to modify the transaction model a bit, but I suspect the pcj authors
will work with us toward that end.
With this as background, I've started sketching out a direction I think we
should pursue. This sketch primarily shows the direction for thinking about
serialization and memory allocation using LLPL. DISCLAIMER: this code doesn't
compile, is not syntactically correct, and is wholly incomplete. It should be
thought of a loose blueprint (sketch!) for discussion.
The sketch compromises of the following concepts:
- thread per sub-range (to reduce lock contention in the data structures).
This is kinda inspired by the thread-per-core notion, but on a smaller scale.
({{TreeManager}} in this patch is a rudimentary dispatch class.)
- how partitions should be stored - allocate a {{MemoryRegion}} from the LLPL
allocator, wrap it with a {{DataOutputPlus}}, and write as we normally would.
- rough implementations of the data structures for the primary index and
storing rows. A longer treatment of this topic will be in the deisgn doc (see
below), but using a tree for the primary index (for partition look up) and then
a map for the cql rows is the basic idea. I mostly want to show the ideas
around serialization so I didn't actually implement the index nor the map -
except for the leaf/entry nodes which show how the serailization/data layout
fits into the data structure.
- explicitly pass the transaction around on writes (instead of looking for it
in a {{ThreadLocal}}, as the pcj transactions does).
||13981-sketch-1||
|[branch|https://github.com/jasobrown/cassandra/tree/13981-sketch-1]|
I am proposing this sketch as a starting for for discussion, along with a
forthcoming design doc to help us work out more high-level details of how
cassandra as a main memory database should look. I'm working on design doc now.
It will explore how we can have a pluggable storage engine implementation that
allows cassandra to run as a main memory database using persistent memory,
while supporting the existing behaviors of cassandra in that kind of system.
> Enable Cassandra for Persistent Memory
> ---------------------------------------
>
> Key: CASSANDRA-13981
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13981
> Project: Cassandra
> Issue Type: New Feature
> Components: Core
> Reporter: Preetika Tyagi
> Assignee: Preetika Tyagi
> Priority: Major
> Fix For: 4.0
>
> Attachments: in-mem-cassandra-1.0.patch, in-mem-cassandra-2.0.patch,
> readme.txt, readme2_0.txt
>
>
> Currently, Cassandra relies on disks for data storage and hence it needs data
> serialization, compaction, bloom filters and partition summary/index for
> speedy access of the data. However, with persistent memory, data can be
> stored directly in the form of Java objects and collections, which can
> greatly simplify the retrieval mechanism of the data. What we are proposing
> is to make use of faster and scalable B+ tree-based data collections built
> for persistent memory in Java (PCJ: https://github.com/pmem/pcj) and enable a
> complete in-memory version of Cassandra, while still keeping the data
> persistent.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]