I just reformated the whole thing and it kind of works.
http://jsfiddle.net/xavierm02/ndDVa/1/
I get a CST and I think I can manage to turn it into an AST.
But I still have a problem with the CFG... As shown, it works
But if I change the order of the things to match it fails. At last on
recursive rules.
Here is an example :
Text
Text Letter
The code about won't work with my "parser" but if i rewrite it to
avoid having the recursive rule first, it works:
Text
Letter Text
The above example works...
But I'm quite sure I won't always be able to avoid having a recursive
rule as first rule so... Do you guys see anything I could do to avoid
it?
I tried allowing a given number of recursion before simply returning
false when seeing a recursive rule but if you have more than that
number, it fails :/
And it I don't do it at all, I get an infinite loop.
Here's the exact same code except I changed the order in the
declaration of MultiplicativeExpression or AdditiveExpression.
http://jsfiddle.net/xavierm02/ndDVa/2/
On Sun, Oct 9, 2011 at 8:16 PM, Xavier MONTILLET
<[email protected]> wrote:
> I figured out how to get something looking like an AST.
> http://jsfiddle.net/xavierm02/HG6Bx/1/
> But comments and ideas on how to throw errors are still welcome.
>
> On Sun, Oct 9, 2011 at 6:10 PM, Xavier MONTILLET
> <[email protected]> wrote:
>> And how do you throw errors with a CFG ?
>> I mean JSLint does throw errors but the Lexer doesn't use a CFG...
>> If you use a CFG, how do you throw errors ? You take the longest /
>> deepest "possibility" that had a meaning ?
>>
>> On Sun, Oct 9, 2011 at 4:17 PM, Xavier MONTILLET
>> <[email protected]> wrote:
>>> Btw, I have no knowledge in Computer Science at all. So if you know
>>> tutorials / books explaining how to build a parser without needed too
>>> much knowledge, please tell me. Because every single tutorial I tried
>>> to read started with strange formulas I couldn't understand and,
>>> instead of providing pseudo-code, just described what the code should
>>> do, gave mathematical methods which I couldn't understand to check
>>> whether the result was going to be what we wanted or not and so on...
>>>
>>> (I know JavaScript quite well because I started doing websites in
>>> high-school but since I left High-school (last year), I've only been
>>> doing Maths and Physics in "classes preparatoire" (a France-specific
>>> system where you do a lot of Maths / Physics for two years in order to
>>> prepare "contests" which, depending on what your ranking is, will
>>> allow you to choose a given school or not. And I won't be doing any
>>> computer science until then...)
>>>
>>> On Sun, Oct 9, 2011 at 4:05 PM, Xavier MONTILLET
>>> <[email protected]> wrote:
>>>> I've started writing the Lexer and for it I used some kind of CFG.
>>>> But I'm not sure whether I understood the aim of those things
>>>> properly. I mean is the CFG supposed to let you build a syntax tree or
>>>> just a token list ? Because I can't figure out how to build the syntax
>>>> tree from it...
>>>> Anyway, I wrote a CFG for basic Maths :
>>>>
>>>> Digit
>>>> '0'
>>>> '1'
>>>> '2'
>>>> '3'
>>>> '4'
>>>> '5'
>>>> '6'
>>>> '7'
>>>> '8'
>>>> '9'
>>>>
>>>> Operator
>>>> '+'
>>>> '-'
>>>> '*'
>>>> '/'
>>>>
>>>> Expression
>>>> '(' Expression ')'
>>>> Expression Operator Expression
>>>> Number
>>>>
>>>> And as I didn't want to bother writing a parser to parse the CFG for
>>>> my parser, I transformed it in JS :
>>>>
>>>> var rules = ( function ( ) {
>>>> var Digit = [ ];
>>>> var Expression = [ ];
>>>> var Operator = [ ];
>>>> Digit.name = 'Digit';
>>>> Expression.name = 'Expression';
>>>> Operator.name = 'Operator';
>>>> Digit.push(
>>>> [ '0' ],
>>>> [ '1' ],
>>>> [ '2' ],
>>>> [ '3' ],
>>>> [ '4' ],
>>>> [ '5' ],
>>>> [ '6' ],
>>>> [ '7' ],
>>>> [ '8' ],
>>>> [ '9' ]
>>>> );
>>>> Expression.push(
>>>> [ '(', Expression, ')' ],
>>>> [ Expression, Operator, Expression ],
>>>> [ Digit ]
>>>> );
>>>> Operator.push(
>>>> [ '+' ],
>>>> [ '-' ],
>>>> [ '*' ],
>>>> [ '/' ]
>>>> );
>>>> return {
>>>> Digit: Digit,
>>>> Expression: Expression,
>>>> Operator: Operator
>>>> };
>>>> } )( );
>>>>
>>>> The name properties are here because I had some infinite loops so it
>>>> was easier to debug with it...
>>>> And I define the arrays before so that I can reference them in others.
>>>>
>>>> Then, with this, I built a lexer:
>>>>
>>>> http://jsfiddle.net/xavierm02/tUCXY/
>>>>
>>>> Is it good or not ?
>>>> Any comment would be welcome. The only thing I'm planning on changing
>>>> is the j += newTokens.join( '' ).length; I probably need it because I
>>>> didn't store indexes properly... Or maybe because I use readTokens
>>>> that calls itself instead of a loop.
>>>>
>>>> Thanks in advance for your comments.
>>>>
>>>> On Mon, Oct 3, 2011 at 9:11 PM, Xavier MONTILLET
>>>> <[email protected]> wrote:
>>>>> Ok thank you :)
>>>>>
>>>>> On Sat, Oct 1, 2011 at 4:31 PM, Lasse Reichstein
>>>>> <[email protected]> wrote:
>>>>>> There is nothing in the complexity of '+' that mandates your Abstract
>>>>>> Syntax
>>>>>> Tree layout. It's still pure syntax at this point.
>>>>>> However, any use of the parsed syntax must behave as if the (+ a b c)
>>>>>> syntax
>>>>>> tree is equivalent to (+ (+ a b) c), so it won't buy you much, and
>>>>>> probably
>>>>>> just makes using the parsed syntax more complex.
>>>>>> If repeated use of the same operator happens a lot, and you can easily
>>>>>> handle the pairing later, by all means do make a single vairable-sized
>>>>>> node.
>>>>>> E.g., in the RegExp grammar, adjacent literal characters are really a
>>>>>> sequence of Alternatives of Terms (of Atoms), but it happens so often
>>>>>> that
>>>>>> you probably want to parse /abel.*/ as containing a single text-node with
>>>>>> the text "abel".
>>>>>> Just don't optimize the abstract syntax without considering how it's
>>>>>> going
>>>>>> to be used.
>>>>>> /L
>>>>>>
>>>>>> On Sat, Oct 1, 2011 at 12:28 PM, Xavier MONTILLET
>>>>>> <[email protected]>
>>>>>> wrote:
>>>>>>>
>>>>>>> Right. I totally forgot that...
>>>>>>> So I have to keep only two operands per operator.
>>>>>>> Thank you for answering :)
>>>>>>>
>>>>>>> On Sat, Oct 1, 2011 at 12:48 AM, Poetro <[email protected]> wrote:
>>>>>>> > The + operator is not just for numbers and can have sideeffects, if
>>>>>>> > the operands have different types or the operands are not numbers or
>>>>>>> > strings (and even in that case). This makes the + operator tricky.
>>>>>>> >
>>>>>>> >>>> "boo" + 1 + 2
>>>>>>> > "boo12"
>>>>>>> >>>> 1 + 2 + "boo"
>>>>>>> > "3boo"
>>>>>>> >
>>>>>>> >>>> var a = {toString: function () { return 1; }, valueOf: function ()
>>>>>>> >>>> {
>>>>>>> >>>> return 2; }}, b = 0; a + b
>>>>>>> > 2
>>>>>>> >
>>>>>>> > --
>>>>>>> > Poetro
>>>>>>> >
>>>>>>> > --
>>>>>>> > To view archived discussions from the original JSMentors Mailman list:
>>>>>>> > http://www.mail-archive.com/[email protected]/
>>>>>>> >
>>>>>>> > To search via a non-Google archive, visit here:
>>>>>>> > http://www.mail-archive.com/[email protected]/
>>>>>>> >
>>>>>>> > To unsubscribe from this group, send email to
>>>>>>> > [email protected]
>>>>>>> >
>>>>>>>
>>>>>>> --
>>>>>>> To view archived discussions from the original JSMentors Mailman list:
>>>>>>> http://www.mail-archive.com/[email protected]/
>>>>>>>
>>>>>>> To search via a non-Google archive, visit here:
>>>>>>> http://www.mail-archive.com/[email protected]/
>>>>>>>
>>>>>>> To unsubscribe from this group, send email to
>>>>>>> [email protected]
>>>>>>
>>>>>> --
>>>>>> To view archived discussions from the original JSMentors Mailman list:
>>>>>> http://www.mail-archive.com/[email protected]/
>>>>>>
>>>>>> To search via a non-Google archive, visit here:
>>>>>> http://www.mail-archive.com/[email protected]/
>>>>>>
>>>>>> To unsubscribe from this group, send email to
>>>>>> [email protected]
>>>>>>
>>>>>
>>>>
>>>
>>
>
--
To view archived discussions from the original JSMentors Mailman list:
http://www.mail-archive.com/[email protected]/
To search via a non-Google archive, visit here:
http://www.mail-archive.com/[email protected]/
To unsubscribe from this group, send email to
[email protected]