Re: [Jchat] feedback on parsing file approach

Joe Bogner Sun, 12 Jan 2014 03:50:52 -0800

Thanks for the sequential machine implementation. I tested with
different versions of the text block and it doesn't work as I
expected, which means I either relayed the requirements wrong or there
may be a bug


For example, if I take out the first block of @{ }, it reports

   calc2 text
blocks 0
max depth 1
max block 25
scripts 2
max script 49

text =: 0 : 0
@{
Response.Write('start');
}
<html>
<script>
alert('start');
</script>
<div id='Foo'>@Page.Foo</div>
<script>
alert($('#Foo').val());
</script>

</html>
@{
Response.Write('bye');
}
)

My implementation posts the correct answer of two blocks - each pair
of @{ and the } that gets back to indent = 0.

It looks like yours requires possibly a brace in the block to trigger
it as a code block.  It also seems to be summing up the total amount
of code and script characters instead of finding the largest one.

The Trace looks helpful to debug.

I've read through the dictionary and nuvoc a few times for sequential
machine and I don't understand it well enough to help troubleshoot
your implementation. I'll spend more time with it. I didn't want to go
down that rabbit hole until I was sure it could provide a correct
result.

I thought about posting to programming but was't sure how
philosophical it would get. Probably better to have started there and
then migrate here if it was philosophical. Feel free to move it to
programming since we're now on the details of the sequential machine
implementation.

Thanks again. I appreciate the opportunity to learn.

On Sat, Jan 11, 2014 at 10:16 PM, Raul Miller <[email protected]> wrote:
> Here's a draft that uses ;:
>
> https://gist.github.com/rdm/8380234
>
> (As an aside, perhaps this thread should be on programming? Or at
> least, something to think about for next time...)
>
> Note that I get different character counts than you. Maybe I
> misunderstood what you intended to count?
>
> Let me know if you want me to clarify or rewrite any of that.
>
> But, briefly, I am using the final states from a ;: trace to mark the
> end of each "token" and then classifying the text based on that
> analysis. Since this sequential machine is a bit bulky, I decided to
> write a small application to build it rather than constructing it by
> hand. Since I only care about the state trace, I use no-op for all
> operations. Since I want the end state, I use 0 _1 0 0 for ijrd
> instead of the default 0 _1 0 _1. This leaves me with my final state
> being the "character position" after the last character in text (and
> it's reported in the trace rather than being an error condition).
>
> Thanks,
>
> --
> Raul
>
> On Sat, Jan 11, 2014 at 4:47 PM, Joe Bogner <[email protected]> wrote:
>> Thank you for the thoughts. You summarized it well.
>>
>> I don't need to worry about attributes on the script tag for this use case.
>> I am interested in quantifying how much embedded javascript is in each of
>> the pages. I don't need to quantify external scripts. I know the code base
>> doesnt use the type="javascript" attribute
>>
>> The braces should be well formed otherwise the c# razor file wouldn't
>> compile. It is possible there may be an edgecase which can be found when I
>> run it against all the files.
>>
>> I plan to use it to identify areas to refactor in the javascript/c# razor
>> code base and then watch it improve over time. I also thought it would be
>> interesting to use a concise and expressive language, J, to measure the
>> more verbose  code base. It doesn’t need to be precise in terms of
>> characters. For example, it is ok if the script tag characters are counted
>> as long as its consistent. I will be using it find large problem areas and
>> then measure the improvement.
>>
>> I would be interested in seeing the sequential machine approach or any
>> other more idiomatic method than mine. I am fairly satisfied with mine. It
>> is fairly clear to me and can likely ne extended if needed. I am trying to
>> use J more in my day to day and that would help me learn and hopefully
>> would be an interesting example for others.
>>
>> Thanks again
>> On Jan 11, 2014 4:11 PM, "Raul Miller" <[email protected]> wrote:
>>
>>> I think I see how I would do that with a sequential machine. Let me
>>> know if you want a working example.
>>>
>>> Briefly, though, you seem to have three kinds of token pairs:
>>>
>>> @{   }
>>> {  }
>>> <script> </script>
>>>
>>> The ambiguity between the first two is problematic, in the context of
>>> errors, but does not matter in well formed cases. A bigger problem in
>>> the wild might be that you do not allow for attributes on the script
>>> tag.
>>>
>>> Also, you care about the number of characters between <script>
>>> </script> so those characters should be saved as "tokens" even if they
>>> are not curly braces. You care about {} between both @{ } and <script>
>>> </script> and outside them, and your implementation allows things like
>>> @{ <script> } </script>.
>>>
>>> A full wart-for-wart compatible version would be painful to write. A
>>> version which assumed well-formed cases would be much easier to write.
>>> But before thinking about coding up an implementation it's probably
>>> worth thinking about why you want to do this. The answer to that kind
>>> of question can be really interesting and can help identify which
>>> warts are unnecessary or possibly even detrimental.
>>>
>>> So, before I think any more about code, what are your thoughts on what
>>> you want to accomplish?
>>>
>>> Thanks,
>>>
>>> --
>>> Raul
>>>
>>>
>>> On Sat, Jan 11, 2014 at 3:40 PM, Joe Bogner <[email protected]> wrote:
>>> > I have about 300 code files (javascript and embedded code) that I want
>>> > to collect some metrics on.  I've written the algorithm using an
>>> > imperative style. I actually wrote it first in C# and translated to J
>>> >
>>> > Here is the code (posted a link for brevity):
>>> >
>>> > J version:
>>> > https://gist.github.com/joebo/936ca5e2017c0a3b5c56
>>> >
>>> > C# version:
>>> > https://gist.github.com/joebo/e7f8e3ca7bd21117e58d
>>> >
>>> > This is what it outputs
>>> >
>>> > calc''
>>> > blocks 3
>>> > max depth 2
>>> > max block 113
>>> > scripts 2
>>> > max script 26
>>> >
>>> > Any suggestions on how to do it differently in J? I looked into the
>>> > sequential machine some but couldn't figure out how to make it work
>>> > (if it could) since my approach required knowledge of the brace depth.
>>> >
>>> > In terms of requirements:
>>> > 1. Take a block of text
>>> > 2. Identify the code blocks in the file (start with @{ and end with } )
>>> > 3. Count the code blocks
>>> > 4. Determine the max depth of the code block
>>> > 5. Determine the max size of all the code blocks
>>> > 6. Count the javascript blocks
>>> > 7. Determine the max size of the javascript block
>>> >
>>> > Thanks for any feedback or input!
>>> >
>>> > Joe
>>> > ----------------------------------------------------------------------
>>> > For information about J forums see http://www.jsoftware.com/forums.htm
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jchat] feedback on parsing file approach

Reply via email to