On 19.10.2010 15:10, Daniel Shahaf wrote:
Greg Stein wrote on Tue, Oct 19, 2010 at 04:31:42 -0400:
Personally, I see [FSv2] as a broad swath of API changes to align our
needs with the underlying storage. Trowbridge noted that our current
API makes it *really* difficult to implement an effective backend. I'd
also like to see a backend that allows for parallel PUTs during the
commit process. Hyrum sees FSv2 as some kind of super-key-value
storage with layers on top, allowing for various types of high-scaling
mechanisms.
At the retreat, stefan2 also had some thoughts about this...

Without going too much into detail, the main issues are:

* Missing 3 layer abstraction: there is no distinction between
  logical data model and external representation. That makes
  it hard to optimize data arrangement on disk (order of node
  deltas etc.) or to cache index (position) information in some
  local context.

* Implementation of a "streamy" server API (good) as a fine-
  grained iteration over some node tree (bad). In a redesigned
  3-layer FS backend, I would like to see set-oriented requests
  ("get list of nodes in that folder / subtree / whatever", "fetch
  data for that list of nodes") that can be transformed in each
  layer to a similar request (or limited number of requests)
  on the respective lower layer. As a result, data on disk could
  be arranged that many high-level requests translate into a
  small number of disk read requests asking for large chunks
  of data. That abstraction of "access planning" will benefit
  DBs and networked file I/O the most.

If someone is working on a design, I would like to review it.
I've got "some" experience what that kind of data processing ...

-- Stefan^2.

Reply via email to