Re: [Jchat] feedback on parsing file approach

Joe Bogner Sun, 12 Jan 2014 08:14:16 -0800

Sorry about that. My requirements were based on more contextual
knowledge than it probably should have.


To take a step back:

In the the c#/razor template language, each code block is delimited by:

@{

}

Within a block, you can add c# code to perform any functions of your
page necessary

@{
       if (Post) {
            Save();
       } else {
           DoSomethingElse();
      }
}

A page can have multiple code blocks. And a code block can have an
infinite depth of branching, denoted by { }

Poor code would have many blocks, or very large blocks or very deep nesting.

@{
       if (Post) {
            if (Monday) {
                 if (After5PM) {
                         if (Before8PM) {
                            Save();
                       }
                 }
            }

       } else {
           DoSomethingElse();
      }
}


A code block is pairs of @{ } where } terminates after the branch
level is zero. Let me know if that's not clear enough. Other
templating languages like php make it easier.

<? php

if (Foo) {

}

?>
<html>foo</html>


In PHP, you wouldn't need to worry about the curly brace depth for
determining code block start and end. It could be split on <?php ?>

In razor, the @{ is the same as <? and the } when brace depth is zero
terminates the block

So I don't have an exact specification that I'm working towards. I'm
just trying to find out how many @{ } code blocks there are, how
deeply nested the code within is, and how large the largest block is.
For example, if it's more than 20 lines or X characters, it probably
belongs in a separate class or file

Of course an edge case that would blow up would be if the code block
has a brace in a string

@{
         if (Post) {
              Response.Write("will break a simple parser } } }} ");
         }
}
I don't think that would be extensive in this code. It's not going to
be used for anything of a critical nature other than to help improve
my personal code base - so if there are false positives or errors it
OK. I'm looking for a "good enough" solution.

Hope that helps. Feel free to cancel if I'm not getting progressively
more clear or if the problem is uninteresting to help solve.

Thanks again

Joe



On Sun, Jan 12, 2014 at 10:56 AM, Raul Miller <[email protected]> wrote:
> I quite possibly misunderstood your specifications.
>
> If I simply remove lines 2 and 11 from my gist, calc2 still reports
> three blocks. If I also remove the three blocks which appear between
> lines 2 and 11, calc2 will then report 0 blocks. Is that not what you
> wanted me to count?
>
> Meanwhile, I do not concern myself very much with whether the
> boundaries of a region of text are "inside" or "outside" that region.
> Instead, I go with what seems simple to implement and then use the
> requirements to tweak the code so that the result is correct. Of
> course, the limitation here is that I need to understand your
> requirements. Another limitation is that new requirements will require
> new code (or manual work) - but that seems to me to be unavoidable.
>
> I expect that once we share an understanding of your requirements that
> an explanation of how the code is structured will make more sense.
>
> Thanks,
>
> --
> Raul
>
>
> On Sun, Jan 12, 2014 at 6:50 AM, Joe Bogner <[email protected]> wrote:
>> Thanks for the sequential machine implementation. I tested with
>> different versions of the text block and it doesn't work as I
>> expected, which means I either relayed the requirements wrong or there
>> may be a bug
>>
>> For example, if I take out the first block of @{ }, it reports
>>
>>    calc2 text
>> blocks 0
>> max depth 1
>> max block 25
>> scripts 2
>> max script 49
>>
>> text =: 0 : 0
>> @{
>> Response.Write('start');
>> }
>> <html>
>> <script>
>> alert('start');
>> </script>
>> <div id='Foo'>@Page.Foo</div>
>> <script>
>> alert($('#Foo').val());
>> </script>
>>
>> </html>
>> @{
>> Response.Write('bye');
>> }
>> )
>>
>> My implementation posts the correct answer of two blocks - each pair
>> of @{ and the } that gets back to indent = 0.
>>
>> It looks like yours requires possibly a brace in the block to trigger
>> it as a code block.  It also seems to be summing up the total amount
>> of code and script characters instead of finding the largest one.
>>
>> The Trace looks helpful to debug.
>>
>> I've read through the dictionary and nuvoc a few times for sequential
>> machine and I don't understand it well enough to help troubleshoot
>> your implementation. I'll spend more time with it. I didn't want to go
>> down that rabbit hole until I was sure it could provide a correct
>> result.
>>
>> I thought about posting to programming but was't sure how
>> philosophical it would get. Probably better to have started there and
>> then migrate here if it was philosophical. Feel free to move it to
>> programming since we're now on the details of the sequential machine
>> implementation.
>>
>> Thanks again. I appreciate the opportunity to learn.
>>
>> On Sat, Jan 11, 2014 at 10:16 PM, Raul Miller <[email protected]> wrote:
>>> Here's a draft that uses ;:
>>>
>>> https://gist.github.com/rdm/8380234
>>>
>>> (As an aside, perhaps this thread should be on programming? Or at
>>> least, something to think about for next time...)
>>>
>>> Note that I get different character counts than you. Maybe I
>>> misunderstood what you intended to count?
>>>
>>> Let me know if you want me to clarify or rewrite any of that.
>>>
>>> But, briefly, I am using the final states from a ;: trace to mark the
>>> end of each "token" and then classifying the text based on that
>>> analysis. Since this sequential machine is a bit bulky, I decided to
>>> write a small application to build it rather than constructing it by
>>> hand. Since I only care about the state trace, I use no-op for all
>>> operations. Since I want the end state, I use 0 _1 0 0 for ijrd
>>> instead of the default 0 _1 0 _1. This leaves me with my final state
>>> being the "character position" after the last character in text (and
>>> it's reported in the trace rather than being an error condition).
>>>
>>> Thanks,
>>>
>>> --
>>> Raul
>>>
>>> On Sat, Jan 11, 2014 at 4:47 PM, Joe Bogner <[email protected]> wrote:
>>>> Thank you for the thoughts. You summarized it well.
>>>>
>>>> I don't need to worry about attributes on the script tag for this use case.
>>>> I am interested in quantifying how much embedded javascript is in each of
>>>> the pages. I don't need to quantify external scripts. I know the code base
>>>> doesnt use the type="javascript" attribute
>>>>
>>>> The braces should be well formed otherwise the c# razor file wouldn't
>>>> compile. It is possible there may be an edgecase which can be found when I
>>>> run it against all the files.
>>>>
>>>> I plan to use it to identify areas to refactor in the javascript/c# razor
>>>> code base and then watch it improve over time. I also thought it would be
>>>> interesting to use a concise and expressive language, J, to measure the
>>>> more verbose  code base. It doesn’t need to be precise in terms of
>>>> characters. For example, it is ok if the script tag characters are counted
>>>> as long as its consistent. I will be using it find large problem areas and
>>>> then measure the improvement.
>>>>
>>>> I would be interested in seeing the sequential machine approach or any
>>>> other more idiomatic method than mine. I am fairly satisfied with mine. It
>>>> is fairly clear to me and can likely ne extended if needed. I am trying to
>>>> use J more in my day to day and that would help me learn and hopefully
>>>> would be an interesting example for others.
>>>>
>>>> Thanks again
>>>> On Jan 11, 2014 4:11 PM, "Raul Miller" <[email protected]> wrote:
>>>>
>>>>> I think I see how I would do that with a sequential machine. Let me
>>>>> know if you want a working example.
>>>>>
>>>>> Briefly, though, you seem to have three kinds of token pairs:
>>>>>
>>>>> @{   }
>>>>> {  }
>>>>> <script> </script>
>>>>>
>>>>> The ambiguity between the first two is problematic, in the context of
>>>>> errors, but does not matter in well formed cases. A bigger problem in
>>>>> the wild might be that you do not allow for attributes on the script
>>>>> tag.
>>>>>
>>>>> Also, you care about the number of characters between <script>
>>>>> </script> so those characters should be saved as "tokens" even if they
>>>>> are not curly braces. You care about {} between both @{ } and <script>
>>>>> </script> and outside them, and your implementation allows things like
>>>>> @{ <script> } </script>.
>>>>>
>>>>> A full wart-for-wart compatible version would be painful to write. A
>>>>> version which assumed well-formed cases would be much easier to write.
>>>>> But before thinking about coding up an implementation it's probably
>>>>> worth thinking about why you want to do this. The answer to that kind
>>>>> of question can be really interesting and can help identify which
>>>>> warts are unnecessary or possibly even detrimental.
>>>>>
>>>>> So, before I think any more about code, what are your thoughts on what
>>>>> you want to accomplish?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> --
>>>>> Raul
>>>>>
>>>>>
>>>>> On Sat, Jan 11, 2014 at 3:40 PM, Joe Bogner <[email protected]> wrote:
>>>>> > I have about 300 code files (javascript and embedded code) that I want
>>>>> > to collect some metrics on.  I've written the algorithm using an
>>>>> > imperative style. I actually wrote it first in C# and translated to J
>>>>> >
>>>>> > Here is the code (posted a link for brevity):
>>>>> >
>>>>> > J version:
>>>>> > https://gist.github.com/joebo/936ca5e2017c0a3b5c56
>>>>> >
>>>>> > C# version:
>>>>> > https://gist.github.com/joebo/e7f8e3ca7bd21117e58d
>>>>> >
>>>>> > This is what it outputs
>>>>> >
>>>>> > calc''
>>>>> > blocks 3
>>>>> > max depth 2
>>>>> > max block 113
>>>>> > scripts 2
>>>>> > max script 26
>>>>> >
>>>>> > Any suggestions on how to do it differently in J? I looked into the
>>>>> > sequential machine some but couldn't figure out how to make it work
>>>>> > (if it could) since my approach required knowledge of the brace depth.
>>>>> >
>>>>> > In terms of requirements:
>>>>> > 1. Take a block of text
>>>>> > 2. Identify the code blocks in the file (start with @{ and end with } )
>>>>> > 3. Count the code blocks
>>>>> > 4. Determine the max depth of the code block
>>>>> > 5. Determine the max size of all the code blocks
>>>>> > 6. Count the javascript blocks
>>>>> > 7. Determine the max size of the javascript block
>>>>> >
>>>>> > Thanks for any feedback or input!
>>>>> >
>>>>> > Joe
>>>>> > ----------------------------------------------------------------------
>>>>> > For information about J forums see http://www.jsoftware.com/forums.htm
>>>>> ----------------------------------------------------------------------
>>>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>>>
>>>> ----------------------------------------------------------------------
>>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jchat] feedback on parsing file approach

Reply via email to