On Wed, Dec 23, 2009 at 12:31 AM, Nathan Kurz <[email protected]> wrote:

>> Whereas, using the filesystem really requires a file-flat data
>> structure?
>
> I guess it depends on your point of view: it would be hard (but not
> impossible) to do true objects in an mmapped file, but it would be
> very easy to do has-a type relationships using file offsets as
> pointers.  I tend to have a data-centric (rather than
> object-centric) point of view, but from here I don't see any data
> structures that would be significantly more difficult.

Interesting -- I guess if you "made" all the pointers relative (ie,
interpreted them so, when reading them), then you could make arbitrary
structures.

> Do you have a link that explains the FST you refer to?  I'm
> searching, and not finding anything that's a definite match.  "Field
> select table"?

Sorry -- FST = finite state transducer.  It adds optional "outputs"
along each edge, over a finite state machine.  When used for the terms
index, I think it'd be a symmetric letter trie -- ie, both prefixes
and suffixes are shared, with the "middle" part of the FST producing
the outputs (= index information for that term that uniquely crosses
that edge).

>> Ie, "going through the filesystem" and "going through shared
>> memory" are two alternatives for enabling efficient process-only
>> concurrency models.  They have interesting tradeoffs (I'll answer
>> more in 2026), but the fact that one of them is backed by a file by
>> the OS seems like a salient difference.
>
> For me, file backing doesn't seem like a big difference.  Fast
> moving changes will never hit disk, and I presume there is some way
> you can convince the system never to actually write out the slow
> changes (maybe mmap on a RamFS?).

What are fast & slow changes here?  Fast = new segments that get
created but then merged away before moving to stable storage?

> I think the real difference is between sharing between threads and
> sharing between processes --- basically, whether or not you can
> assume that the address space is identical in all the 'sharees'.

Yes, process only concurreny seems like the big difference.

> I'll mention that, given the New Year, at first I thought 2026 was
> your realistic time estimate rather than a tracking number.

Heh ;)

> I started thinking about how one could do objects with mmap, and
> came up with an approach that doesn't quite answer that question but
> might actually work out well for other problems: you could literally
> compile your index and link it in as a shared library.  Each term
> would be a symbol, and you'd use 'dlsym' to find the associated
> data.
>
> It's possible that you could even use library versioning to handle
> updates, and stuff like RTLD_NEXT to handle multiple
> segments. Perhaps a really bad idea, but I find it an intriguing
> one.  I wonder how fast using libdl would be compared to writing
> your own lookup tables.  I'd have to guess it's fairly efficient.

That is a wild idea!  I wonder how dlsym represents its information...

Mike

Reply via email to