Thanks Jorge, I turned off idle notifications and exposed the gc which runs 
every 30 minutes or so. That's had a good impact on CPU across the whole 
application. Then for the hash I switched to a Buffer backed hash 
implementation. I can keep the per key memory overhead much lower like that 
and the access times are on par with the native JS hash. It allows for 
shorter keys, for example 16 bytes binary is enough for 128 bit keys, 
whereas with the JS hash you would need about 24 bytes of Base62 to get the 
same. Also, since it's just a buffer, and the keys are coming from disk on 
startup, you can copy the keys from disk directly into the hash without 
having to do thousands of mem copies, so it's a few times faster than the 
original implementation. Another advantage is you can predict the size of 
the hash in advance and avoid expensive hash resizes (tens of seconds for a 
few million keys using the JS hash) for a much faster startup time. Have 
tested up to 60 million keys and it handles fine and starts up in about 60 
seconds parsing and loading the keys from disk. Memory overhead is much 
more predictable (exactly 25 bytes per key all included). JS strings alone 
on 64 bit are 24 bytes excluding JS hash overhead and value overhead. Just 
from this small exercise I think TypedArrays are really going to start 
making a huge difference for JS as a dynamic language.

On Saturday, May 19, 2012 1:08:40 PM UTC+2, Jorge wrote:
>
> If this is not fixed yet, you could move the hash to a thread_a_gogo so 
> that the hiccups won't happen in node's event loop thread, but in the 
> thread_a_gogo. You will still see delays when accessing the hash keys, but 
> as the access to the hash will be asynchronous they won't be blocking node. 
>
> <https://gist.github.com/2730481> 
>
> Supply a 2nd argument to run with threads_a_gogo, let it run for a while 
> and see that you still get a sane figure in "event loop ticks per second": 
>
> $ node keys.js yes 
> Multi thread 
> ... 
> ... 
> ***** Event loop ticks per second -> 366547, keys per second -> 1360 
> ... 
>
> Unlike when you run it single-threaded because the GC hiccups are blocking 
> it: 
>
> $ node keys.js 
> Single thread 
> ... 
> ... 
> ***** Event loop ticks per second -> 4138, keys per second -> 4138 
> ... 
>
> Threads_a_gogo calls the GC every 2000 or so turns IIRC, that's perhaps 
> too often, and that's why the "keys per second" figure is much lower. But 
> over a thousand per second on average might be plenty depending on your 
> application. 
>
> https://github.com/xk/node-threads-a-gogo 
> -- 
> Jorge. 
>
> On May 11, 2012, at 6:07 PM, Joran Greef wrote: 
>
> > Ja it was becoming a problem around 3 million entries. 
> > 
> > Tried multiple smaller hashes, but no help, the GC was still visiting 
> every key in every hash. 
> > 
> > With nouse_idle_notification, there's no problem, works great now. 
> > 
> > Tim, I was looking at nStore's file implementation and saw you were 
> serializing reads. Were you doing this only to use a shared read buffer, or 
> does this also work with the disk better? I was thinking allowing multiple 
> concurrent readers MVCC style would work better together with the 
> filesystem cache. 
> > 
> > On Friday, May 11, 2012 4:26:00 PM UTC+2, Tim Caswell wrote: 
> > So this will finally get fixed!  This exact case was one of the blockers 
> that made me abandon nStore.  I was storing offsets in a js object and the 
> GC spun out of control around 1 million entries. (The other blocker was 
> node's fs interface was way too slow to implement a disk-based database, I 
> clocked 90% CPU time in mutex locks) 
> > 
> > On Fri, May 11, 2012 at 4:35 AM, Joran Greef <[email protected]> wrote: 
> > Jeremy, I was trying to understand why GC was spending time when there 
> should be no work to do. 
> > 
> > Marcel, I came across your blog post on the subject an hour ago and 
> spotted the v8 issue as well just before your post here. 
> > 
> > Thanks for your suggestions. 
> > 
> > 
> > On Friday, May 11, 2012 10:57:20 AM UTC+2, Marcel wrote: 
> > There is an issue in v8 where idle tick GC does not pick up where the 
> old GC left off and leads to lots of time wasted. See v8 issue #1458. This 
> is fixed in bleeding_edge, but hasn't landed in node yet, not even 0.7.x. 
> Try Jeremy's suggestion or you could also try using bleeding_edge v8 on 
> node 0.7.x. I imagine both would lead to improvements. 
> > 
> > On Fri, May 11, 2012 at 3:43 AM, Jérémy Lal <[email protected]> wrote: 
> > Idle -> GC -> visiting objects (?) 
> > 
> > Hence my suggestion : control gc() calls yourself. 
> > 
> > On 11/05/2012 10:20, Joran Greef wrote: 
> > > The thing is the JS is doing nothing, the huge hash is just sitting 
> there. 
> > > 
> > > On Friday, May 11, 2012 9:57:47 AM UTC+2, kapouer wrote: 
> > > 
> > >     Isn't that gc doing its work ? 
> > >     As a workaround, you can turn it off and run it manually 
> > >     node --nouse_idle_notification --expose_gc 
> > >     > global.gc(); 
> > > 
> > >     Regards, 
> > >     J�r�my. 
> > > 
> > >     On 11/05/2012 09:51, Joran Greef wrote: 
> > >     > I have posted this in v8-users but perhaps someone else here 
> will also be familiar with this: 
> > >     > 
> > >     > I am using V8 as part of Node and have written a Javascript 
> implementation of Bitcask, using a Javascript object as a hash to keep file 
> offsets in memory. 
> > >     > 
> > >     > This object has 7 million entries and I'm noticing that while 
> the JS code is resting, doing nothing, V8 is hitting 100% CPU every few 
> seconds and doing this continually. 
> > >     > 
> > >     > Attached is the full result of running V8 with --prof. 
> > >     > 
> > >     > And of particular interest: 
> > >     > 
> > >     > [C++]: 
> > >     >    ticks  total  nonlib   name 
> > >     >   73615   43.1%   43.1% 
>  v8::internal::StaticMarkingVisitor::VisitUnmarkedObjects 
> > >     >   68436   40.1%   40.1%  _accept$NOCANCEL 
> > >     >    4796    2.8%    2.8% 
>  v8::internal::FlexibleBodyVisitor<v8::internal::StaticMarkingVisitor, 
> v8::internal::JSObject::BodyDescriptor, void>::VisitSpecialized<40> 
> > >     > 
> > >     > Should I be using many smaller hashes to keep this overhead 
> down? i.e. some sort of sparse hash implementation? Or using key mod 1000 
> to determine the hash it should be in? 
> > >     > 
> > >     > Does V8 have limits on hash table sizes? 
> > >     > 
> > >     > Thanks. 
> > >     > 
> > >     > -- 
> > >     > Job Board: http://jobs.nodejs.org/ 
> > >     > Posting guidelines: 
> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines <
> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines> 
> > >     > You received this message because you are subscribed to the 
> Google 
> > >     > Groups "nodejs" group. 
> > >     > To post to this group, send email to [email protected]<mailto:
> [email protected]> 
> > >     > To unsubscribe from this group, send email to 
> > >     > [email protected] <mailto:
> nodejs%[email protected]> 
> > >     > For more options, visit this group at 
> > >     > http://groups.google.com/group/nodejs?hl=en?hl=en <
> http://groups.google.com/group/nodejs?hl=en?hl=en> 
> > > 
> > 
> > -- 
> > Job Board: http://jobs.nodejs.org/ 
> > Posting guidelines: 
> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines 
> > You received this message because you are subscribed to the Google 
> > Groups "nodejs" group. 
> > To post to this group, send email to [email protected] 
> > To unsubscribe from this group, send email to 
> > [email protected] 
> > For more options, visit this group at 
> > http://groups.google.com/group/nodejs?hl=en?hl=en 
> > 
> > 
> > -- 
> > Job Board: http://jobs.nodejs.org/ 
> > Posting guidelines: 
> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines 
> > You received this message because you are subscribed to the Google 
> > Groups "nodejs" group. 
> > To post to this group, send email to [email protected] 
> > To unsubscribe from this group, send email to 
> > [email protected] 
> > For more options, visit this group at 
> > http://groups.google.com/group/nodejs?hl=en?hl=en 
> > 
> > 
> > -- 
> > Job Board: http://jobs.nodejs.org/ 
> > Posting guidelines: 
> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines 
> > You received this message because you are subscribed to the Google 
> > Groups "nodejs" group. 
> > To post to this group, send email to [email protected] 
> > To unsubscribe from this group, send email to 
> > [email protected] 
> > For more options, visit this group at 
> > http://groups.google.com/group/nodejs?hl=en?hl=en 
>
>

-- 
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

Reply via email to