Re: [Jchat] feedback on parsing file approach

Raul Miller Sat, 11 Jan 2014 19:17:52 -0800

Here's a draft that uses ;:

https://gist.github.com/rdm/8380234


(As an aside, perhaps this thread should be on programming? Or at
least, something to think about for next time...)

Note that I get different character counts than you. Maybe I
misunderstood what you intended to count?

Let me know if you want me to clarify or rewrite any of that.

But, briefly, I am using the final states from a ;: trace to mark the
end of each "token" and then classifying the text based on that
analysis. Since this sequential machine is a bit bulky, I decided to
write a small application to build it rather than constructing it by
hand. Since I only care about the state trace, I use no-op for all
operations. Since I want the end state, I use 0 _1 0 0 for ijrd
instead of the default 0 _1 0 _1. This leaves me with my final state
being the "character position" after the last character in text (and
it's reported in the trace rather than being an error condition).

Thanks,

-- 
Raul

On Sat, Jan 11, 2014 at 4:47 PM, Joe Bogner <[email protected]> wrote:
> Thank you for the thoughts. You summarized it well.
>
> I don't need to worry about attributes on the script tag for this use case.
> I am interested in quantifying how much embedded javascript is in each of
> the pages. I don't need to quantify external scripts. I know the code base
> doesnt use the type="javascript" attribute
>
> The braces should be well formed otherwise the c# razor file wouldn't
> compile. It is possible there may be an edgecase which can be found when I
> run it against all the files.
>
> I plan to use it to identify areas to refactor in the javascript/c# razor
> code base and then watch it improve over time. I also thought it would be
> interesting to use a concise and expressive language, J, to measure the
> more verbose  code base. It doesn’t need to be precise in terms of
> characters. For example, it is ok if the script tag characters are counted
> as long as its consistent. I will be using it find large problem areas and
> then measure the improvement.
>
> I would be interested in seeing the sequential machine approach or any
> other more idiomatic method than mine. I am fairly satisfied with mine. It
> is fairly clear to me and can likely ne extended if needed. I am trying to
> use J more in my day to day and that would help me learn and hopefully
> would be an interesting example for others.
>
> Thanks again
> On Jan 11, 2014 4:11 PM, "Raul Miller" <[email protected]> wrote:
>
>> I think I see how I would do that with a sequential machine. Let me
>> know if you want a working example.
>>
>> Briefly, though, you seem to have three kinds of token pairs:
>>
>> @{   }
>> {  }
>> <script> </script>
>>
>> The ambiguity between the first two is problematic, in the context of
>> errors, but does not matter in well formed cases. A bigger problem in
>> the wild might be that you do not allow for attributes on the script
>> tag.
>>
>> Also, you care about the number of characters between <script>
>> </script> so those characters should be saved as "tokens" even if they
>> are not curly braces. You care about {} between both @{ } and <script>
>> </script> and outside them, and your implementation allows things like
>> @{ <script> } </script>.
>>
>> A full wart-for-wart compatible version would be painful to write. A
>> version which assumed well-formed cases would be much easier to write.
>> But before thinking about coding up an implementation it's probably
>> worth thinking about why you want to do this. The answer to that kind
>> of question can be really interesting and can help identify which
>> warts are unnecessary or possibly even detrimental.
>>
>> So, before I think any more about code, what are your thoughts on what
>> you want to accomplish?
>>
>> Thanks,
>>
>> --
>> Raul
>>
>>
>> On Sat, Jan 11, 2014 at 3:40 PM, Joe Bogner <[email protected]> wrote:
>> > I have about 300 code files (javascript and embedded code) that I want
>> > to collect some metrics on.  I've written the algorithm using an
>> > imperative style. I actually wrote it first in C# and translated to J
>> >
>> > Here is the code (posted a link for brevity):
>> >
>> > J version:
>> > https://gist.github.com/joebo/936ca5e2017c0a3b5c56
>> >
>> > C# version:
>> > https://gist.github.com/joebo/e7f8e3ca7bd21117e58d
>> >
>> > This is what it outputs
>> >
>> > calc''
>> > blocks 3
>> > max depth 2
>> > max block 113
>> > scripts 2
>> > max script 26
>> >
>> > Any suggestions on how to do it differently in J? I looked into the
>> > sequential machine some but couldn't figure out how to make it work
>> > (if it could) since my approach required knowledge of the brace depth.
>> >
>> > In terms of requirements:
>> > 1. Take a block of text
>> > 2. Identify the code blocks in the file (start with @{ and end with } )
>> > 3. Count the code blocks
>> > 4. Determine the max depth of the code block
>> > 5. Determine the max size of all the code blocks
>> > 6. Count the javascript blocks
>> > 7. Determine the max size of the javascript block
>> >
>> > Thanks for any feedback or input!
>> >
>> > Joe
>> > ----------------------------------------------------------------------
>> > For information about J forums see http://www.jsoftware.com/forums.htm
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jchat] feedback on parsing file approach

Reply via email to