--
Chris Vest
System Engineer, Neo Technology
[ skype: mr.chrisvest, twitter: chvest ]


> On 04 Feb 2015, at 13:39, Roman Leventov <[email protected]> wrote:
> 
> 
> 
> On Wednesday, 4 February 2015 06:19:50 UTC+7, Chris Vest wrote:
> Hi Roman,
> 
> Nice write up. One thing I noticed is that you are mixing up page cache 
> information from 2.2, with earlier Neo4j versions. We rewrite the page cache 
> from scratch in 2.2, and that’s where the 8192 byte page size was introduced. 
> It’s a sensitive number because it both controls the unit of IO with the 
> underlying storage subsystem, and it controls the coarseness of the page 
> locks.
> Could you please point what information in the post is not valid for 2.2? I 
> was primary basing on master source code, i. e. 2.2.

The quote "Neo4j uses multiple file buffer caches…” is from the 2.1.6 
documentation that you linked to. Here’s a link to the same page for 2.2.0-M03: 
http://neo4j.com/docs/2.2.0-M03/configuration-caches.html 
<http://neo4j.com/docs/2.2.0-M03/configuration-caches.html>

>  
> We did consider relying on OS memory mapping, but Java will only let you map 
> 2 GBs in one go, so you’d have to do something to support files larger than 
> that anyway. Another issue is that freeing the native memory held by a 
> MappedByteBuffer requires two GCs. With our own page cache, we are better 
> able to reuse the allocated native memory, so we don’t have to worry about 
> the freeing. We also gain simpler, and more controlled failure handling. I 
> hadn’t seen this one coming, but it turned out to be a nice surprise.
> Mapping more than 2 GB is possible via sun.nio.ch.FileChannelImpl.map0(). Or 
> you have requirement to run Neo4j on JVM, which doens't support this API?

I don’t think we considered this API. Initially we wanted to stick to official 
APIs, but now we’ve allowed ourselves to use sun.misc.Unsafe, and a couple of 
other internal APIs as well. If we hypothetically switched to mapping whole 
files in one go, we’d have to think of a new way to ensure record write 
atomicity. We currently use page-level locks for this. Our isolation level is 
read-committed, and this means that reads don’t take high-level read locks, so 
the onus is on our store layer to ensure that reads and writes of records are 
consistent; you don’t want people to read half-written records. If we remove 
the pages, but keep the page locks, then it wouldn’t save us the lookup cost.

> 
> I was not aware about two GCs, but anyway the purpose of the whole mechanism 
> of creating Unmapper object, adding it to ReferenceQueue of 
> PhantomReferences, which makes resource relese not so easy, is that API 
> suppose that ByteBuffer will be used in the user code, and resources could be 
> released only after references to ByteBuffer are dead. Since you don't expose 
> ByteBuffers to Neo4j users, you can deterministically control mannings 
> directly via FileChannelImpl.map0/unmap0() calls, or ((DirectBuffer) 
> bb).cleaner().clean().

In embedded mode, the lifetime of a transaction can last beyond the shutdown of 
the database. You won’t be able to commit, but you can perform reads 
concurrently with the shutdown. To reduce overhead, we don’t precisely track 
references to pages, but rely on the GC to determine when it is safe to free 
the native memory.

> 
> Now I try to unserstand, is there any harm in having a lot of small 
> (basically of a single page each) mappings, so that you can combine own page 
> eviction heuristics and manageability with native page mapping. See 
> http://stackoverflow.com/questions/28273878/does-linux-carry-data-structures-abstractions-for-separate-mmap-calls
>  
> <http://stackoverflow.com/questions/28273878/does-linux-carry-data-structures-abstractions-for-separate-mmap-calls>
If we use official APIs, then there’s the overhead. All the pages would now 
also hold on to buffers and cleaners. If I remember correctly, we currently 
spend 64 bytes per page (on 32 bit JVMs, or when compressed oops are enabled), 
plus some overhead from organising structures in the page cache. This would 
increase a good deal. If we instead use FileChannelImpl.map0, then the memory 
overhead would stay the same, and that might be interesting. I’d have to 
research how it influences error handling, though. But this only saves us a 
memcpy during page faults, doesn’t it? That doesn’t sound like a big win.

> 
> --
> Chris Vest
> System Engineer, Neo Technology
> [ skype: mr.chrisvest, twitter: chvest ]
> 
> 
>> On 03 Feb 2015, at 17:12, Roman Leventov <[email protected] <javascript:>> 
>> wrote:
>> 
>> I've done some Neo4j architecture analysis: 
>> http://key-value-stories.blogspot.com/2015/02/neo4j-architecture.html 
>> <http://key-value-stories.blogspot.com/2015/02/neo4j-architecture.html>
>> 
>> Corrections and answers to open questions from Neo4j developers are welcome.
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout 
>> <https://groups.google.com/d/optout>.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to