06-Jul-2014 07:19, Wanderer пишет:
On Saturday, 5 July 2014 at 16:03:17 UTC, Dmitry Olshansky wrote:
There are trade-offs. The world is not black and white and I don't
follow 'one rule everywhere'.

This is not a trade-off at all. You suggested to keep database records
linearly, with space gaps between records to support "tiny updates".
Without serious updates, this is major waste of space. With them, your
design won't save the day, because any gap will be eventually consumed
and without fragmentation/reordering, the storage will fail.


Strawman.

FYI, nowaday popular databases aren't designed this way, exactly for the
reasons I described above. All databases I worked with (MySQL, Oracle
and Firebird) hold records in very compact way, without a single byte
gaps, with ability of fragmentation, and without total physical
ordering. So at least designers of these DBMS have agreed that your
design is not practical.

Well they got to store a link somewhere, right? That's already 4 or 8 bytes of "gap" to have in case of adding an extra field. In fact taking a peek at typical DB file I see LOTS of zeros in between real data.

Again - linked vs linear *is* a tradeoff, and it's taken not once and for all. It's good to have ability to choose, but from your posts it's apparent you don't need choice and just fine without it.

Pointers are perfectly fine as long as there is no pointer arithmetic.

Wrong. Merely holding a pointer (i.e. a physical address) is unsafe
already.

Anyhow your definition of unsafe is hard to grasp.

Non-deep serialization, or any other "preservation" of such a
struct and  GC is unable to keep the track of pointers.

You are no better with references in these kind of circumstances. GC is able to track references/pointers, that's its job.


Which is hopelessly admitting defeat. A pair may have non-trivial
comparator. And a minor step beyond that such as a pair of double and
integer and it fails completely.

I said above: in any non-trivial case, use classes instead of
overly-clever structures.

Why? I understand that where you are coming from there is simply no choice and hence the argument of 'just use this' holds water.

And this is nothing clever in a simple hierarchical composition of records stored in one piece of memory contiguously.

And if you really, really need premature
optimization, there is java.nio and buffers. Create ByteBuffer (direct
one if you need super optimized solution) and treat slices of it as
"structs".

It would be inferior to structs in many ways, including really slow binary serialization to the buffer step (Java has no way to just bit-blitting integers to byte array).

BTW I fail to see how NIO would instantly imply super optimized solution esp as it would require reconstructing fields from bytes on each access.

That's possible and easy to implement, but really not needed
in practice because all you get is 0.1% of memory saving and no gain in
speed.

Since in Java you really can't measure this stuff (as there is no way to lay things out in memory) you would excuse me to not trust on this one. Storing extra reference + an unused monitor field + vtable per a pair of 2 integers is about 50% memory wasted.


And there Java-style indirection-happy classes "shine". For instance
modeling any complex stuff with classes alone would lead to things like:
House--reference->Bedroom---reference-->Bed--reference-->Pillow

Which is incredible waste of memory and speed on something so simple.

That's not complex. That's very simple.

Oh, sure now I see let's discard things that don't seem to fit.
It's a common knowledge that Java doesn't implement composition properly, the only choice provided is delegation (i.e. keeping a reference instead of actually containing an object). There is simply no argument that's going to prove that "you really don't need composition".

It would become slightly more
complex if all house parts implemented the same root interface and basic
operations, like examining what's around and finding an object given by
its path or properties, or throwing an extra pillow or two onto the same
bed. All that is just a dream for structs.

Again your ignorance of what a struct could do shows. They could have methods just fine. C++ std::vector is a struct after all, so an extra pillow goes just fine.

More over using such data structure with references everywhere ruins performance a topic you suddenly started to dismiss.

Copy constructors or in D parlance postblits. There is also CoW and
whatnot. In fact swap doesn't create a copy in D, it just does a
bitwise swap in-place.

And here we have a tough choice for structs with pulled subdata "for
efficiency": either assignment operator copies that data too (making
routines like sort to perform horribly slow), or they perform only
shallow copy, causing undue sharing of data and violating the whole
struct "value" philosophy.

Again - sort doesn't copy values at all, it swaps. And there are nice ways to design copy semantics like CoW.

There is a reason why Java String is now eagerly copied behind the scenes - as it's now (internally) using small string optimization instead of copy-on-write. It's funny how only JVM can do things on this "level of structs" in Java.

Disagree - the moment a cache line is loaded it doesn't matter how
much you read in this line. Even a tiny read missing a cache is paid
in full.

But the amount of these "missed reads" is low, so less amount of cache
is invalidated. CPU cache is not a single page that gets invalidated as
a whole, it's more like many small subpages, each of them is treaten
individually.


I know what a CPU cache looks like.

If you're really into the low-level mumbo jumbo, here are two more
aspects for you to consider.

1. Since indirection does not require the object contents to be moved
around while sorting, these objects can be read into cache once and
never invalidated later - unlike "right in place" structs which
invalidate CPU cache each time a swap is performed. That is a huge cache
win already.

Plain wrong. Again seeing you have strong Java background, isn't particularly surprising.

2. Don't forget that new objects are created in eden space in Java -
which is essentially the stack area. Very fast, compact, sequential
memory. If your array fits in eden (and that's surely true for the
forest problem this thread has started from), then allocating array of
objects is essentially the same as allocating array of structs: object
contents aren't "scattered in memory" but are located one-after-one
without gaps between them.

Nobody told that an array is constructed in one pass or anything like that. There may easily (and in Java - surely) allocation in between. Just scratching the surface of how weak this argument is.

That greatly aids caching as well.

Suddenly you too seem to understand that sequential storage aids CPU caches.

> The main
difference between eden and "array of structs" is that the allocation of
the former never fails (assuming there's enough memory), only gets
slightly slower in the worst case, and allocation of the latter can fail
because "stack overflow error"

Ignorance.

or "too much memory fragmentation oops",
even if there's enough free memory.

Now this makes sense but not enough to prove that "references are ultimately better" or any other general statement.

And the only thing worse then copying a large file is copying a large
file fragmented in tiny splinters.

You surely meant "reading" here, not "copying". Copying large fragmented
file is as slow as copying large unfragmented file, because writing
operations are the bottleneck here.

Sequential writes are okay, seeking each block isn't.

--
Dmitry Olshansky

Reply via email to