On Tue, Apr 20, 2010 at 1:04 AM, Ron Burk <[email protected]> wrote: >> Out of curiosity; may i ask you why "function call per token" is >> something bad for multi-core performance? > > That aspect is independent of multiple cores. > If you implement an efficient lexer (e.g., a minimized > DFA), then the overhead of function call tends to be > a significant percentage of the per-token work of the > lexer (based on my experience processing C/C++ > source -- YMMV wrt language, though I doubt by > much). >
I suppose, speaking of Java, that this method's size is decent enough to be conveniently inlined. It can also be proven to be monomorphic in most cases, I believe, so no virtual dispatch calls. Of course, it depends and cannot be taken for granted. >> I am asking because the way >> i see this issue is that pursuing two different paths in NFA has no >> inherent coupling. It is simply matter of state replication and >> merging it back to single outcome state upon entering some barrier >> marking "end of work" state. Am I missing something? > > I don't think so. Once you're using an NFA for lexing, you'll > likely be insanely less efficient than a minimized DFA, > so the overhead of a function call per token should be > unnoticed amidst the general slowness :-). Are you thinking of costs of backtracking here? If you were > trying to use multiple cores to follow paths in an NFA, > then "merging it back" sure sounds like a place to stall > on attempting to update memory that is shared between > cores. I wouldn't predict multiple cores would be faster for > that without measuring it first on the target CPU. Actually, I don't think that this merging is going to be needed. Pursuing every possible path till it collapses into final state or error and may be sufficient. This is of course seriously undermined by arbitrary user code that may be intertwined with lexer :-)) If you > haven't looked at how ugly things have gotten inside > in the last few years, I recommend: > > http://www.infoq.com/presentations/click-crash-course-modern-hardware > Thx, I will take a look > The root problem for getting anywhere close to the theoretically > possible speeds of putting lexing on a different core from > parsing is the constant stalling over the shared data (the > tokens). They actually do not "share" data. Lexer is simply providing the data which means, in my opinion, that input can be fully tokenized and then passed to parser. -- Greetings Marcin Rzeźnicki List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
