Fabian Deutsch wrote:
Am Mittwoch, den 29.07.2009, 13:59 -0400 schrieb Jeff Garzik:
Or to take converse logic -- is it likely that service->service replication is SLOWER than client->service replication?

Every way I look at it, client->{service,service,service}
replication
seems both easy... and potentially slower than alternatives :)
To elaborate a bit more...  there obviously are cases where you want
the client to be the genesis of parallel data streams into the cloud.

My point was more that there are real world situations where multiple outgoing streams from the client is significantly slower than a single stream into the cloud, plus asking the cloud to perform further
copies.

Yes, I agree that we just should have one stream per BLOB to one chunkd,
but we might attach replication destinations when streaming this blob to
one chunkd. The result is, that we've just got one stream to a chunkd instance,
including some replication destinations, and chunkd will hapilly spread
the relpicates.
So we are just keeping the logic of where to replciate to, away from
chunkd and leave it to the client (which can ask a third daemon) where
to store the replicates.

I think we all agree on keeping the logic of where to replicate to, away from chunkd.

chunkd should be as dumb^H^H^Hsimple as possible, to permit maximum flexibility of chunkd-based applications.

chunkd-based applications will be the ones making chunk load balancing decisions, for example.


dsts[] = logic->getDstsFor(blob)
chunk->put(blob, dsts) /* Will return after successfull replc. */

The local in-cloud replication strategy, like chaining or parallel could
be passed too, but might not be as relevant as the destinations itself.

The more I think about this, the more I think this will simply become a configuration setting of the storage pool[1], i.e. inside tabled or nfs4d configuration.

That would permit local administrators to make a decision whether chaining (from the client!) or parallel should be used.

All of this, it must be noted, is long term discussion.

As of today, chunkd is "defacto" coded to be parallel-from-client because that's the only method possible today :)

        Jeff


[1] Or perhaps the concept of a storage pool -- a collection of chunkd's shared by multiple applications -- will have its own configuration. Another long term discussion for another day...




--
To unsubscribe from this list: send the line "unsubscribe hail-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to