Yes, local storage volumes on each machine.
> On May 5, 2017, at 3:25 PM, daemeon reiydelle <daeme...@gmail.com> wrote:
>
> These numbers do not match e.g. AWS, so guessing you are using local storage?
>
>
> ...
> Making a billion dollar startup is easy: "take a human desire, preferably one
> that has been around for a really long time … Identify that desire and use
> modern technology to take out steps."
> ...
> Daemeon C.M. Reiydelle
> USA (+1) 415.501.0198
> London (+44) (0) 20 8144 9872
>
> On Fri, May 5, 2017 at 12:19 PM, Jonathan Guberman <j...@tineye.com
> <mailto:j...@tineye.com>> wrote:
> Hello,
>
> We’re currently testing Cassandra for use as a pure key-object store for data
> blobs around 10kB - 60kB each. Our use case is storing on the order of 10
> billion objects with about 5-20 million new writes per day. A written object
> will never be updated or deleted. Objects will be read at least once, some
> time within 10 days of being written. This will generally happen as a batch;
> that is, all of the images written on a particular day will be read together
> at the same time. This batch read will only happen one time; future reads
> will happen on individual objects, with no grouping, and they will follow a
> long-tail distribution, with popular objects read thousands of times per year
> but most read never or virtually never.
>
> I’ve set up a small four node test cluster and have written test scripts to
> benchmark writing and reading our data. The table I’ve set up is very simple:
> an ascii primary key column with the object ID and a blob column for the
> data. All other settings were left at their defaults.
>
> I’ve found write speeds to be very fast most of the time. However,
> periodically, writes will slow to a crawl for anywhere between half an hour
> to two hours, after which speeds recover to their previous levels. I assume
> this is some sort of data compaction or flushing to disk, but I haven’t been
> able to figure out the exact cause.
>
> Read speeds have been more disappointing. Cached reads are very fast, but
> random read speed averages about 2 MB/sec, which is too slow when we need to
> read out a batch of several million objects. I don’t think it’s reasonable to
> assume that these rows will all still be cached by the time we need to read
> them for that first large batch read.
>
> My general question is whether anyone has any suggestions for how to improve
> performance for our use case. More specifically:
>
> - Is there a way to mitigate or eliminate the huge slowdowns I see when
> writing millions of rows?
> - Are there settings I should be using in order to maximize read speeds for
> random reads?
> - Is there a way to design our tables to improve the read speeds for the
> initial large batched reads? I was thinking of using a batch ID column that
> could be used to retrieve the data for the initial block. However, future
> reads would need to be done by the object ID, not the batch ID, so it seems
> to me I’d need to duplicate the data, one in a “objects by batch” table, and
> the other in a simple “objects” table. Is there a better approach than this?
>
> Thank you!
>
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> <mailto:user-unsubscr...@cassandra.apache.org>
> For additional commands, e-mail: user-h...@cassandra.apache.org
> <mailto:user-h...@cassandra.apache.org>
>
>