[Neo4j] Better support for large property data

Tobias Ivarsson Fri, 18 Feb 2011 06:20:40 -0800

Having tackled short strings, I feel up for taking a stab at long strings,
and large binary data objects.


I know that Rick Bullotta is really interested in this, and I can imagine
others wanting to store large properties as well. I would love to get your
input on the ideas I have, as well as hearing about the ideas you might
have.

The way I see it there are two different kinds of large data objects.

The first one is long strings, or text. Imagine building a blog engine on
Neo4j, the text body of a blog post is likely going to be around a thousand
characters. That is a lot of blocks in the DynamicStringStore. But you still
want to support shorter strings (the title of the post for example), without
much overhead, so you don't want to increase the block size for the
DynamicStringStore. In your code you want to deal with these values as
String objects though, you don't want a different object type just because
the string happens to be longer.

The second one is large binary data objects. Data objects that are too large
to want to have allocated as a String object, or even as a byte[] object.
You want to manipulate them through some sort of streaming interface. These
data objects are also so large that you would prefer if their content wasn't
written to the transaction logs, because that would mean that Neo4j needed
to rotate the log extremely frequently, and since you keep the logical logs
for HA and backup, it would fill up your disks twice as quickly as it
needed. Properties like this would, for example, be used for storing images
that are included in the blog posts.


For long Strings (the first point), the solution I'm thinking of is to
replace the stringstore and arraystore with a smallstore and a largestore.
Both being dynamic block stores as they are today, but with different block
sizes. Then store both arrays and strings in both of these stores. The type
of the data stored in the block is stored in the property record for the
property that references the blocks anyhow, so there isn't a great advantage
of having different block stores for strings and arrays.

For BLOBs (the second point), we need additions to the API, since you want
to work with these things in a streaming fashion.
I am thinking that we use java.nio.channels.ReadableByteChannel for these
properties. Why ReadableByteChannel you ask? Why not InputStream?
First reason: InputStream can be converted to ReadableByteChannel, and vice
versa:
http://download.oracle.com/javase/6/docs/api/index.html?java/nio/channels/Channels.html
Second reason: ReadableByteChannel is a really simple interface (only three
methods) if you want to write your own custom implementation.

Setting a BLOB property would then look like this:

ReadableByteChannel myBlob = ...
node.setProperty("a_blob", myBlob);

Getting would look like this:

ReadableByteChannel myBlob =
(ReadableByteChannel)node.getProperty("a_blob");


Perhaps we could then, also come up with some nice API for appending to a
BLOB property:

ReadableByteChannel moreData = ...
ReadableByteChannel myBlob =
(ReadableByteChannel)node.getProperty("a_blob");
node.setProperty( "a_blob", BlobUtils.append(myBlob, moreData) );


Comment please.
-- 
Tobias Ivarsson <tobias.ivars...@neotechnology.com>
Hacker, Neo Technology
www.neotechnology.com
Cellphone: +46 706 534857
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

[Neo4j] Better support for large property data

Reply via email to