Re: [Jchat] feedback on parsing file approach

Devon McCormick Wed, 15 Jan 2014 08:05:33 -0800

This (old) trick works nicely if you can fit complete expressions into
memory:


   +/\(1 _1 0){~'{}' i. '@{ foo; if (abc) { if (q) { m; } } }'
0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3 3 3 3 3 2 2 1 1 0

It becomes clunkier if you have to process expressions in sequential chunks
but can be made to work.


On Sun, Jan 12, 2014 at 8:35 AM, Joe Bogner <[email protected]> wrote:

> I like the approach of classifying the text. It seems more generic and
> allows different metrics to be captured outside of the scanning
> routine.
>
> I went down this path and got stuck because I think my state needs to
> also have knowledge of the bracket depth. Here is my failed attempt at
> thinking through it:
>
> Starting simpler, with the @{ } blocks and ignoring the script tags for a
> moment
>
> I think the sequential machine needs to do something like this for
> each character
>
> s is the current state
> c is the current character
>
> initial state = S_NONE = 0
> S_AT = 1
> S_CODE = 2
> S_LEFT = 3
> S_RIGHT = 4
>
> A series of functions that return the state transitions, otherwise use
> the current state
>
> (s=NONE & c='@') -> S_AT                       NB. Trigger start @{
> (s=S_AT & c='{') -> S_CODE                     NB. Identify @{
> (s=S_CODE & c='{') -> S_LEFT                 NB. Trigger starting brace if
> () {
> (s=S_LEFT & c='}') -> S_RIGHT
> (s=S_RIGHT) -> S_CODE                          NB. Needed for ending brace
> (s=S_CODE & c='}') -> S_NONE
>
> That would handle this case
>
> @{ foo; if (abc) { q; } }
>
> It would be something like
>
> 1222222...3333333420
>
> Assuming it was properly classified, I could count the depth by
> counting up the consecutive 3s and 2s before a 4s. I could count the
> block size by numbers greater than 0.
>
> I could count the blocks by pairs of 12 (or something like that)
>
> Depth of brackets has me stuck conceptually as-in
>
> @{ foo; if (abc) { if (q) { m; } } }
>
> 1222222...33333333333333420000
>
> I manually assigned those numbers so they are probably wrong. It feels
> like I need another state for code within a brace
>
> I might be able to come up with a table if there's a state for each
> brace depth or use ranges and greater than/less than
>
> S_BRACE_L_1, S_BRACE_L_2 and S_BRACE_R_1 and SBRACE_R_2 but that
> doesn't seem practical
>
> Or let the state transition logic use globals and let the state
> transition function assign globals
>
> This is where I got stuck.
>
> On Sun, Jan 12, 2014 at 6:50 AM, Joe Bogner <[email protected]> wrote:
> > Thanks for the sequential machine implementation. I tested with
> > different versions of the text block and it doesn't work as I
> > expected, which means I either relayed the requirements wrong or there
> > may be a bug
> >
> > For example, if I take out the first block of @{ }, it reports
> >
> >    calc2 text
> > blocks 0
> > max depth 1
> > max block 25
> > scripts 2
> > max script 49
> >
> > text =: 0 : 0
> > @{
> > Response.Write('start');
> > }
> > <html>
> > <script>
> > alert('start');
> > </script>
> > <div id='Foo'>@Page.Foo</div>
> > <script>
> > alert($('#Foo').val());
> > </script>
> >
> > </html>
> > @{
> > Response.Write('bye');
> > }
> > )
> >
> > My implementation posts the correct answer of two blocks - each pair
> > of @{ and the } that gets back to indent = 0.
> >
> > It looks like yours requires possibly a brace in the block to trigger
> > it as a code block.  It also seems to be summing up the total amount
> > of code and script characters instead of finding the largest one.
> >
> > The Trace looks helpful to debug.
> >
> > I've read through the dictionary and nuvoc a few times for sequential
> > machine and I don't understand it well enough to help troubleshoot
> > your implementation. I'll spend more time with it. I didn't want to go
> > down that rabbit hole until I was sure it could provide a correct
> > result.
> >
> > I thought about posting to programming but was't sure how
> > philosophical it would get. Probably better to have started there and
> > then migrate here if it was philosophical. Feel free to move it to
> > programming since we're now on the details of the sequential machine
> > implementation.
> >
> > Thanks again. I appreciate the opportunity to learn.
> >
> > On Sat, Jan 11, 2014 at 10:16 PM, Raul Miller <[email protected]>
> wrote:
> >> Here's a draft that uses ;:
> >>
> >> https://gist.github.com/rdm/8380234
> >>
> >> (As an aside, perhaps this thread should be on programming? Or at
> >> least, something to think about for next time...)
> >>
> >> Note that I get different character counts than you. Maybe I
> >> misunderstood what you intended to count?
> >>
> >> Let me know if you want me to clarify or rewrite any of that.
> >>
> >> But, briefly, I am using the final states from a ;: trace to mark the
> >> end of each "token" and then classifying the text based on that
> >> analysis. Since this sequential machine is a bit bulky, I decided to
> >> write a small application to build it rather than constructing it by
> >> hand. Since I only care about the state trace, I use no-op for all
> >> operations. Since I want the end state, I use 0 _1 0 0 for ijrd
> >> instead of the default 0 _1 0 _1. This leaves me with my final state
> >> being the "character position" after the last character in text (and
> >> it's reported in the trace rather than being an error condition).
> >>
> >> Thanks,
> >>
> >> --
> >> Raul
> >>
> >> On Sat, Jan 11, 2014 at 4:47 PM, Joe Bogner <[email protected]>
> wrote:
> >>> Thank you for the thoughts. You summarized it well.
> >>>
> >>> I don't need to worry about attributes on the script tag for this use
> case.
> >>> I am interested in quantifying how much embedded javascript is in each
> of
> >>> the pages. I don't need to quantify external scripts. I know the code
> base
> >>> doesnt use the type="javascript" attribute
> >>>
> >>> The braces should be well formed otherwise the c# razor file wouldn't
> >>> compile. It is possible there may be an edgecase which can be found
> when I
> >>> run it against all the files.
> >>>
> >>> I plan to use it to identify areas to refactor in the javascript/c#
> razor
> >>> code base and then watch it improve over time. I also thought it would
> be
> >>> interesting to use a concise and expressive language, J, to measure the
> >>> more verbose  code base. It doesn’t need to be precise in terms of
> >>> characters. For example, it is ok if the script tag characters are
> counted
> >>> as long as its consistent. I will be using it find large problem areas
> and
> >>> then measure the improvement.
> >>>
> >>> I would be interested in seeing the sequential machine approach or any
> >>> other more idiomatic method than mine. I am fairly satisfied with
> mine. It
> >>> is fairly clear to me and can likely ne extended if needed. I am
> trying to
> >>> use J more in my day to day and that would help me learn and hopefully
> >>> would be an interesting example for others.
> >>>
> >>> Thanks again
> >>> On Jan 11, 2014 4:11 PM, "Raul Miller" <[email protected]> wrote:
> >>>
> >>>> I think I see how I would do that with a sequential machine. Let me
> >>>> know if you want a working example.
> >>>>
> >>>> Briefly, though, you seem to have three kinds of token pairs:
> >>>>
> >>>> @{   }
> >>>> {  }
> >>>> <script> </script>
> >>>>
> >>>> The ambiguity between the first two is problematic, in the context of
> >>>> errors, but does not matter in well formed cases. A bigger problem in
> >>>> the wild might be that you do not allow for attributes on the script
> >>>> tag.
> >>>>
> >>>> Also, you care about the number of characters between <script>
> >>>> </script> so those characters should be saved as "tokens" even if they
> >>>> are not curly braces. You care about {} between both @{ } and <script>
> >>>> </script> and outside them, and your implementation allows things like
> >>>> @{ <script> } </script>.
> >>>>
> >>>> A full wart-for-wart compatible version would be painful to write. A
> >>>> version which assumed well-formed cases would be much easier to write.
> >>>> But before thinking about coding up an implementation it's probably
> >>>> worth thinking about why you want to do this. The answer to that kind
> >>>> of question can be really interesting and can help identify which
> >>>> warts are unnecessary or possibly even detrimental.
> >>>>
> >>>> So, before I think any more about code, what are your thoughts on what
> >>>> you want to accomplish?
> >>>>
> >>>> Thanks,
> >>>>
> >>>> --
> >>>> Raul
> >>>>
> >>>>
> >>>> On Sat, Jan 11, 2014 at 3:40 PM, Joe Bogner <[email protected]>
> wrote:
> >>>> > I have about 300 code files (javascript and embedded code) that I
> want
> >>>> > to collect some metrics on.  I've written the algorithm using an
> >>>> > imperative style. I actually wrote it first in C# and translated to
> J
> >>>> >
> >>>> > Here is the code (posted a link for brevity):
> >>>> >
> >>>> > J version:
> >>>> > https://gist.github.com/joebo/936ca5e2017c0a3b5c56
> >>>> >
> >>>> > C# version:
> >>>> > https://gist.github.com/joebo/e7f8e3ca7bd21117e58d
> >>>> >
> >>>> > This is what it outputs
> >>>> >
> >>>> > calc''
> >>>> > blocks 3
> >>>> > max depth 2
> >>>> > max block 113
> >>>> > scripts 2
> >>>> > max script 26
> >>>> >
> >>>> > Any suggestions on how to do it differently in J? I looked into the
> >>>> > sequential machine some but couldn't figure out how to make it work
> >>>> > (if it could) since my approach required knowledge of the brace
> depth.
> >>>> >
> >>>> > In terms of requirements:
> >>>> > 1. Take a block of text
> >>>> > 2. Identify the code blocks in the file (start with @{ and end with
> } )
> >>>> > 3. Count the code blocks
> >>>> > 4. Determine the max depth of the code block
> >>>> > 5. Determine the max size of all the code blocks
> >>>> > 6. Count the javascript blocks
> >>>> > 7. Determine the max size of the javascript block
> >>>> >
> >>>> > Thanks for any feedback or input!
> >>>> >
> >>>> > Joe
> >>>> >
> ----------------------------------------------------------------------
> >>>> > For information about J forums see
> http://www.jsoftware.com/forums.htm
> >>>> ----------------------------------------------------------------------
> >>>> For information about J forums see
> http://www.jsoftware.com/forums.htm
> >>>>
> >>> ----------------------------------------------------------------------
> >>> For information about J forums see http://www.jsoftware.com/forums.htm
> >> ----------------------------------------------------------------------
> >> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>



-- 
Devon McCormick, CFA
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jchat] feedback on parsing file approach

Reply via email to