Re: [Jchat] feedback on parsing file approach

Raul Miller Sun, 12 Jan 2014 07:57:45 -0800

I quite possibly misunderstood your specifications.

If I simply remove lines 2 and 11 from my gist, calc2 still reports
three blocks. If I also remove the three blocks which appear between
lines 2 and 11, calc2 will then report 0 blocks. Is that not what you
wanted me to count?


Meanwhile, I do not concern myself very much with whether the
boundaries of a region of text are "inside" or "outside" that region.
Instead, I go with what seems simple to implement and then use the
requirements to tweak the code so that the result is correct. Of
course, the limitation here is that I need to understand your
requirements. Another limitation is that new requirements will require
new code (or manual work) - but that seems to me to be unavoidable.

I expect that once we share an understanding of your requirements that
an explanation of how the code is structured will make more sense.

Thanks,

-- 
Raul


On Sun, Jan 12, 2014 at 6:50 AM, Joe Bogner <[email protected]> wrote:
> Thanks for the sequential machine implementation. I tested with
> different versions of the text block and it doesn't work as I
> expected, which means I either relayed the requirements wrong or there
> may be a bug
>
> For example, if I take out the first block of @{ }, it reports
>
>    calc2 text
> blocks 0
> max depth 1
> max block 25
> scripts 2
> max script 49
>
> text =: 0 : 0
> @{
> Response.Write('start');
> }
> <html>
> <script>
> alert('start');
> </script>
> <div id='Foo'>@Page.Foo</div>
> <script>
> alert($('#Foo').val());
> </script>
>
> </html>
> @{
> Response.Write('bye');
> }
> )
>
> My implementation posts the correct answer of two blocks - each pair
> of @{ and the } that gets back to indent = 0.
>
> It looks like yours requires possibly a brace in the block to trigger
> it as a code block.  It also seems to be summing up the total amount
> of code and script characters instead of finding the largest one.
>
> The Trace looks helpful to debug.
>
> I've read through the dictionary and nuvoc a few times for sequential
> machine and I don't understand it well enough to help troubleshoot
> your implementation. I'll spend more time with it. I didn't want to go
> down that rabbit hole until I was sure it could provide a correct
> result.
>
> I thought about posting to programming but was't sure how
> philosophical it would get. Probably better to have started there and
> then migrate here if it was philosophical. Feel free to move it to
> programming since we're now on the details of the sequential machine
> implementation.
>
> Thanks again. I appreciate the opportunity to learn.
>
> On Sat, Jan 11, 2014 at 10:16 PM, Raul Miller <[email protected]> wrote:
>> Here's a draft that uses ;:
>>
>> https://gist.github.com/rdm/8380234
>>
>> (As an aside, perhaps this thread should be on programming? Or at
>> least, something to think about for next time...)
>>
>> Note that I get different character counts than you. Maybe I
>> misunderstood what you intended to count?
>>
>> Let me know if you want me to clarify or rewrite any of that.
>>
>> But, briefly, I am using the final states from a ;: trace to mark the
>> end of each "token" and then classifying the text based on that
>> analysis. Since this sequential machine is a bit bulky, I decided to
>> write a small application to build it rather than constructing it by
>> hand. Since I only care about the state trace, I use no-op for all
>> operations. Since I want the end state, I use 0 _1 0 0 for ijrd
>> instead of the default 0 _1 0 _1. This leaves me with my final state
>> being the "character position" after the last character in text (and
>> it's reported in the trace rather than being an error condition).
>>
>> Thanks,
>>
>> --
>> Raul
>>
>> On Sat, Jan 11, 2014 at 4:47 PM, Joe Bogner <[email protected]> wrote:
>>> Thank you for the thoughts. You summarized it well.
>>>
>>> I don't need to worry about attributes on the script tag for this use case.
>>> I am interested in quantifying how much embedded javascript is in each of
>>> the pages. I don't need to quantify external scripts. I know the code base
>>> doesnt use the type="javascript" attribute
>>>
>>> The braces should be well formed otherwise the c# razor file wouldn't
>>> compile. It is possible there may be an edgecase which can be found when I
>>> run it against all the files.
>>>
>>> I plan to use it to identify areas to refactor in the javascript/c# razor
>>> code base and then watch it improve over time. I also thought it would be
>>> interesting to use a concise and expressive language, J, to measure the
>>> more verbose  code base. It doesn’t need to be precise in terms of
>>> characters. For example, it is ok if the script tag characters are counted
>>> as long as its consistent. I will be using it find large problem areas and
>>> then measure the improvement.
>>>
>>> I would be interested in seeing the sequential machine approach or any
>>> other more idiomatic method than mine. I am fairly satisfied with mine. It
>>> is fairly clear to me and can likely ne extended if needed. I am trying to
>>> use J more in my day to day and that would help me learn and hopefully
>>> would be an interesting example for others.
>>>
>>> Thanks again
>>> On Jan 11, 2014 4:11 PM, "Raul Miller" <[email protected]> wrote:
>>>
>>>> I think I see how I would do that with a sequential machine. Let me
>>>> know if you want a working example.
>>>>
>>>> Briefly, though, you seem to have three kinds of token pairs:
>>>>
>>>> @{   }
>>>> {  }
>>>> <script> </script>
>>>>
>>>> The ambiguity between the first two is problematic, in the context of
>>>> errors, but does not matter in well formed cases. A bigger problem in
>>>> the wild might be that you do not allow for attributes on the script
>>>> tag.
>>>>
>>>> Also, you care about the number of characters between <script>
>>>> </script> so those characters should be saved as "tokens" even if they
>>>> are not curly braces. You care about {} between both @{ } and <script>
>>>> </script> and outside them, and your implementation allows things like
>>>> @{ <script> } </script>.
>>>>
>>>> A full wart-for-wart compatible version would be painful to write. A
>>>> version which assumed well-formed cases would be much easier to write.
>>>> But before thinking about coding up an implementation it's probably
>>>> worth thinking about why you want to do this. The answer to that kind
>>>> of question can be really interesting and can help identify which
>>>> warts are unnecessary or possibly even detrimental.
>>>>
>>>> So, before I think any more about code, what are your thoughts on what
>>>> you want to accomplish?
>>>>
>>>> Thanks,
>>>>
>>>> --
>>>> Raul
>>>>
>>>>
>>>> On Sat, Jan 11, 2014 at 3:40 PM, Joe Bogner <[email protected]> wrote:
>>>> > I have about 300 code files (javascript and embedded code) that I want
>>>> > to collect some metrics on.  I've written the algorithm using an
>>>> > imperative style. I actually wrote it first in C# and translated to J
>>>> >
>>>> > Here is the code (posted a link for brevity):
>>>> >
>>>> > J version:
>>>> > https://gist.github.com/joebo/936ca5e2017c0a3b5c56
>>>> >
>>>> > C# version:
>>>> > https://gist.github.com/joebo/e7f8e3ca7bd21117e58d
>>>> >
>>>> > This is what it outputs
>>>> >
>>>> > calc''
>>>> > blocks 3
>>>> > max depth 2
>>>> > max block 113
>>>> > scripts 2
>>>> > max script 26
>>>> >
>>>> > Any suggestions on how to do it differently in J? I looked into the
>>>> > sequential machine some but couldn't figure out how to make it work
>>>> > (if it could) since my approach required knowledge of the brace depth.
>>>> >
>>>> > In terms of requirements:
>>>> > 1. Take a block of text
>>>> > 2. Identify the code blocks in the file (start with @{ and end with } )
>>>> > 3. Count the code blocks
>>>> > 4. Determine the max depth of the code block
>>>> > 5. Determine the max size of all the code blocks
>>>> > 6. Count the javascript blocks
>>>> > 7. Determine the max size of the javascript block
>>>> >
>>>> > Thanks for any feedback or input!
>>>> >
>>>> > Joe
>>>> > ----------------------------------------------------------------------
>>>> > For information about J forums see http://www.jsoftware.com/forums.htm
>>>> ----------------------------------------------------------------------
>>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>>
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jchat] feedback on parsing file approach

Reply via email to