Excellent! I think your description matches the image well. I went back and walked through the table and realized I missed #5 on the cols. Fixed in this version
http://imgur.com/oIGD16S Thank you! I think I'm all set on this one On Sun, Jan 12, 2014 at 2:50 PM, Raul Miller <[email protected]> wrote: > Yes... > > Columns in "States" correspond to the boxes in (<"+Chars),<a.-.Chars > > And, as you deduced, there is one column for each enumerated > character, and an extra column for "everything else". If I did not > specify how to handle a character, that character would be treated as > an error. This might even be good but I decided it was irrelevant for > now. (For example, I might want this code analyzer to reject documents > which contain the ascii null character. But that would be extra code > which detracts from presenting the core functionality.) > > To better understand States, we can look at what characters are > recognized in each state. Using a space to represent the "I do not > recognize this, let's reset the state machine" case, we can index a > string of the left arguments of appendToken to get: > > ' @{}<script></script>'{~States > @{}< > @{}< > @{}< > @{}< > /s > c > r > i > p > t > > > @{}< > > s > c > r > i > p > t > > > @{}< > > So state 0, the initial state, accepts characters which begin one of > my "tokens". Also, every "ending state" works exactly like the initial > state. We only get state transitions on characters, and I wanted > distinct ending states so that I could use ending state to recognize > the matched token. > > The blank row for 12{States is basically an error (or, more precisely, > "unnecessary padding"). It arose because I put the / for </script> in > 4{States but still allocated space for it in the states table. These > kinds of warts creep into code (and other systems) all the time. They > are something we like to remove when we polish our code or eat too > many resources, but since this is a learning exercise I should > probably leave that one for you to see if you would like to fix. > > Anyways, philosophically, appendToken is adding states to match the > argument given as a token. That means that when in state 0 we need to > be able to match the token. > > Hypothetically speaking, I might want any state to be able to match > the token. That would mean that the @{}< we see above would appear in > every single row. I considered that but it was not relevant to the > example, so I left it out. > > Once we've recognized a character, we then need states for each > subsequent character in a token. This has to do with why @ and { are > separate tokens. But this state factoring issue was not my only > reason. It's just easier to distinguish ending @{ } and { } pairs > using subtraction after things are classified than through creating > some special purpose state logic. > > I see a new message from you arriving and I think I have wrapped up > this message. I'll send this now and see how much of it you have made > irrelevant in your next message. :) > > Thanks, > > -- > Raul > > > > On Sun, Jan 12, 2014 at 1:49 PM, Joe Bogner <[email protected]> wrote: >> Great! Yes, that change counts the blocks the way I need it to. >> >> As you pointed out, the requirements weren't very well spec'd and led >> to ambiguities. I had thought the implementation was relatively clear >> and it wasn't a good assumption to think that it would be read. I >> think tests would have worked well to more clearly illustrate the >> expectations. I couldn't imagine writing a parser without them. >> >> Raul wrote: >>> A related issue is that your measure of "depth" did not include @{ } - >>> both immediately inside and immediately outside these tokens is depth >>> 0, if I understand properly what you meant by depth. >> >> Yes, that's fine. My goal was to measure how deep the nesting was. It >> doesn't matter to me if its zero or one based. >> >> Thanks for the tips on how to incorporate logic for detecting in { } in >> strings. >> >> We actually weren't that far off on understanding considering how >> little you needed to change to count blocks the way I had intended. >> >> The max blocks still isn't right, but that's OK. I will see if I can >> fix it or start writing some tests to demonstrate it better. >> >> text =: 0 : 0 >> @{ if (foo) { } } >> ) >> >> shows max block 2 >> >> text =: 0 : 0 >> @{ Response.Write("hi"); } >> ) >> >> shows max block 0, which leads me to believe it needs a brace inside >> the code block to start counting. I would have assumed it would be >> some number of characters close to # ' Response.Write("hi"); ' >> >> 23 >> >> I should be able to figure out most of how you did it but I'm stumped >> on the State table. I think I understand classify except for one part >> (summarized below) >> >> >> classify=: 1}.Ends i. 2 {"1(5;(States,"+0);((<"+Chars),<a.-.Chars);0 >> _1 0 0) ;: ] >> >> Working right to left >> >> NB. I can figure this out later. Your explanation is good and the >> dictionary covers it >> ijrd =. 0 _1 0 0 >> >> NB. The first part makes sense, the second part looks to be bunch of >> junk characters? >> m=.((<"+Chars),<a.-.Chars) >> >> In a trivial example, it looks like it classifies the rows the same way: >> >> m=.((<"+Chars)) >> (y i.~;m) { (#m),~(#&>m)#i.#m NB. From the dictionary entry >> >> 0 1 12 12 12 5 9 12 12 5 12 12 12 7 8 10 12 12 12 12 8 12 12 12 12 2 12 >> >> m=.((<"+Chars),<a.-.Chars) >> (y i.~;m) { (#m),~(#&>m)#i.#m NB. From the dictionary entry >> >> 0 1 12 12 12 5 9 12 12 5 12 12 12 7 8 10 12 12 12 12 8 12 12 12 12 2 12 >> >> What is the purpose of <a.-.Chars? Is that "every other character" >> than what was specified? >> >> s=:(States,"+0) >> >> This adds the 0 operation to each of the States per your earlier note >> of using 0 to no-op >> >> f=: 5 >> (f;s;m;ijrd) ;: text >> >> >> NB. extract the 3rd column from the trace >> cols=:2 {"1(f;s;m;ijrd) ;: text >> 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 >> >> NB. turn the col back into the token number from appendToken >> Ends i. cols >> >> >> I think I understand all of that. >> >> The State table has a shape of 21 13, which is the length of the text >> token we are looking for on the rows and length of the characters that >> make up those tokens on the columns. >> >> # '@{}<script></script>' >> 20 >> >> # Chars >> 12 >> >> I took a stab at adding the character and token on the x & y axis. I >> don't think I have it lined up quite right and I'm sure it doesn't >> look great on e-mail. If you can help decrypt the table that would be >> helpful as I am not following completely what appendToken is doing to >> build it. >> >> >> >> @ { } < / s c r i p t > >> @ 1 2 3 4 0 0 0 0 0 0 0 0 0 >> { 1 2 3 4 0 0 0 0 0 0 0 0 0 >> } 1 2 3 4 0 0 0 0 0 0 0 0 0 >> < 1 2 3 4 0 0 0 0 0 0 0 0 0 >> s 0 0 0 0 13 5 0 0 0 0 0 0 0 >> c 0 0 0 0 0 0 6 0 0 0 0 0 0 >> r 0 0 0 0 0 0 0 7 0 0 0 0 0 >> i 0 0 0 0 0 0 0 0 8 0 0 0 0 >> p 0 0 0 0 0 0 0 0 0 9 0 0 0 >> t 0 0 0 0 0 0 0 0 0 0 10 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 11 0 >> < 1 2 3 4 0 0 0 0 0 0 0 0 0 >> / 0 0 0 0 0 0 0 0 0 0 0 0 0 >> s 0 0 0 0 0 14 0 0 0 0 0 0 0 >> c 0 0 0 0 0 0 15 0 0 0 0 0 0 >> r 0 0 0 0 0 0 0 16 0 0 0 0 0 >> i 0 0 0 0 0 0 0 0 17 0 0 0 0 >> p 0 0 0 0 0 0 0 0 0 18 0 0 0 >> t 0 0 0 0 0 0 0 0 0 0 19 0 0 >>> 0 0 0 0 0 0 0 0 0 0 0 20 0 >> 1 2 3 4 0 0 0 0 0 0 0 0 0 >> >> States 5,6,7,8,9,10,11 must be used to track <script> or >> 14,15,16,17,18,19,20 does it. >> >> Not sure why there isn't a state 12 >> >> Any guidance on the table would be appreciated. This is really cool. >> Thanks again >> >> >> On Sun, Jan 12, 2014 at 12:16 PM, Raul Miller <[email protected]> wrote: >>> Ok... >>> >>> Translating what I think you are saying into implementation, I think >>> you want to change >>> >>> smoutput 'blocks ',":Left -&(+/) Codes >>> >>> to >>> >>> smoutput 'blocks ',":+/ Codes >>> >>> "Left" was bits marking left curly brackets (there were 6 in your >>> sample text) while Codes was bits marking instances of @{ (there were >>> 3 in your sample text). >>> >>> A related issue is that your measure of "depth" did not include @{ } - >>> both immediately inside and immediately outside these tokens is depth >>> 0, if I understand properly what you meant by depth. >>> >>> You will note, here, that I did not actually read your code very >>> closely - that is because I was more interested in paraphrasing it >>> than in copying it, and that means understanding what you were >>> thinking more than understand what you implemented. We sometimes >>> approximate this process using requirements, sometimes using tests and >>> perhaps in a variety of other ways. >>> >>> Also, it might help you to understand the code better if you replaced >>> every =. in calc2 with =: (=. is great for isolating internal >>> definitions in explicit verbs, but =: is much better for making things >>> visible or ... explicit?). >>> >>> That said, we can exclude { and } which appear in irrelevant contexts >>> by first declaring what those contexts are (double quoted strings? >>> multi-line comments? single line comments?) and then adjusting the >>> definition of State to distinguish them from the recognized instances >>> of { and }. >>> >>> Let's say that I wanted to exclude { in double quoted strings. Here's >>> an outline: >>> >>> (1) Include " in the definition of Chars >>> (2) Introduce a new routine appendTokenPair which works like >>> appendToken but leaves the sequential machine in an alternate state >>> until receiving a second token. >>> (3) use this new routine to include " ... " in our definition of State. >>> >>> Once this was working, using it for /* ... */ should be trivial, >>> though the use of multi-character tokens might be an issue, depending >>> on how appendTokenPair was implemented. >>> >>> The thing you need to watch out for, when working with parsers, is >>> ambiguities. In this example, we had an ambiguity between @{ and { >>> where hypothetically speaking they might be confused. This was one of >>> my motivations for focusing on requirements instead of simply diving >>> into the implementation. >>> >>> Being able to move from implementation to specification is not easy - >>> I love focusing on the computer and I sometimes find human >>> interactions painful (I do not like bothering people and while I might >>> occasionally enjoy getting yelled at I find I need to do something to >>> please people yelling at me after - or at least something I am >>> comfortable interpreting as pleasing - going away seems to count, >>> somehow. Mostly, though, I have a lot of respect for heads-down focus, >>> even when it's taken too far.) >>> >>> Does this make sense? >>> >>> Thanks, >>> >>> -- >>> Raul >>> >>> >>> >>> On Sun, Jan 12, 2014 at 11:13 AM, Joe Bogner <[email protected]> wrote: >>>> Sorry about that. My requirements were based on more contextual >>>> knowledge than it probably should have. >>>> >>>> To take a step back: >>>> >>>> In the the c#/razor template language, each code block is delimited by: >>>> >>>> @{ >>>> >>>> } >>>> >>>> Within a block, you can add c# code to perform any functions of your >>>> page necessary >>>> >>>> @{ >>>> if (Post) { >>>> Save(); >>>> } else { >>>> DoSomethingElse(); >>>> } >>>> } >>>> >>>> A page can have multiple code blocks. And a code block can have an >>>> infinite depth of branching, denoted by { } >>>> >>>> Poor code would have many blocks, or very large blocks or very deep >>>> nesting. >>>> >>>> @{ >>>> if (Post) { >>>> if (Monday) { >>>> if (After5PM) { >>>> if (Before8PM) { >>>> Save(); >>>> } >>>> } >>>> } >>>> >>>> } else { >>>> DoSomethingElse(); >>>> } >>>> } >>>> >>>> >>>> A code block is pairs of @{ } where } terminates after the branch >>>> level is zero. Let me know if that's not clear enough. Other >>>> templating languages like php make it easier. >>>> >>>> <? php >>>> >>>> if (Foo) { >>>> >>>> } >>>> >>>> ?> >>>> <html>foo</html> >>>> >>>> >>>> In PHP, you wouldn't need to worry about the curly brace depth for >>>> determining code block start and end. It could be split on <?php ?> >>>> >>>> In razor, the @{ is the same as <? and the } when brace depth is zero >>>> terminates the block >>>> >>>> So I don't have an exact specification that I'm working towards. I'm >>>> just trying to find out how many @{ } code blocks there are, how >>>> deeply nested the code within is, and how large the largest block is. >>>> For example, if it's more than 20 lines or X characters, it probably >>>> belongs in a separate class or file >>>> >>>> Of course an edge case that would blow up would be if the code block >>>> has a brace in a string >>>> >>>> @{ >>>> if (Post) { >>>> Response.Write("will break a simple parser } } }} "); >>>> } >>>> } >>>> I don't think that would be extensive in this code. It's not going to >>>> be used for anything of a critical nature other than to help improve >>>> my personal code base - so if there are false positives or errors it >>>> OK. I'm looking for a "good enough" solution. >>>> >>>> Hope that helps. Feel free to cancel if I'm not getting progressively >>>> more clear or if the problem is uninteresting to help solve. >>>> >>>> Thanks again >>>> >>>> Joe >>>> >>>> >>>> >>>> On Sun, Jan 12, 2014 at 10:56 AM, Raul Miller <[email protected]> >>>> wrote: >>>>> I quite possibly misunderstood your specifications. >>>>> >>>>> If I simply remove lines 2 and 11 from my gist, calc2 still reports >>>>> three blocks. If I also remove the three blocks which appear between >>>>> lines 2 and 11, calc2 will then report 0 blocks. Is that not what you >>>>> wanted me to count? >>>>> >>>>> Meanwhile, I do not concern myself very much with whether the >>>>> boundaries of a region of text are "inside" or "outside" that region. >>>>> Instead, I go with what seems simple to implement and then use the >>>>> requirements to tweak the code so that the result is correct. Of >>>>> course, the limitation here is that I need to understand your >>>>> requirements. Another limitation is that new requirements will require >>>>> new code (or manual work) - but that seems to me to be unavoidable. >>>>> >>>>> I expect that once we share an understanding of your requirements that >>>>> an explanation of how the code is structured will make more sense. >>>>> >>>>> Thanks, >>>>> >>>>> -- >>>>> Raul >>>>> >>>>> >>>>> On Sun, Jan 12, 2014 at 6:50 AM, Joe Bogner <[email protected]> wrote: >>>>>> Thanks for the sequential machine implementation. I tested with >>>>>> different versions of the text block and it doesn't work as I >>>>>> expected, which means I either relayed the requirements wrong or there >>>>>> may be a bug >>>>>> >>>>>> For example, if I take out the first block of @{ }, it reports >>>>>> >>>>>> calc2 text >>>>>> blocks 0 >>>>>> max depth 1 >>>>>> max block 25 >>>>>> scripts 2 >>>>>> max script 49 >>>>>> >>>>>> text =: 0 : 0 >>>>>> @{ >>>>>> Response.Write('start'); >>>>>> } >>>>>> <html> >>>>>> <script> >>>>>> alert('start'); >>>>>> </script> >>>>>> <div id='Foo'>@Page.Foo</div> >>>>>> <script> >>>>>> alert($('#Foo').val()); >>>>>> </script> >>>>>> >>>>>> </html> >>>>>> @{ >>>>>> Response.Write('bye'); >>>>>> } >>>>>> ) >>>>>> >>>>>> My implementation posts the correct answer of two blocks - each pair >>>>>> of @{ and the } that gets back to indent = 0. >>>>>> >>>>>> It looks like yours requires possibly a brace in the block to trigger >>>>>> it as a code block. It also seems to be summing up the total amount >>>>>> of code and script characters instead of finding the largest one. >>>>>> >>>>>> The Trace looks helpful to debug. >>>>>> >>>>>> I've read through the dictionary and nuvoc a few times for sequential >>>>>> machine and I don't understand it well enough to help troubleshoot >>>>>> your implementation. I'll spend more time with it. I didn't want to go >>>>>> down that rabbit hole until I was sure it could provide a correct >>>>>> result. >>>>>> >>>>>> I thought about posting to programming but was't sure how >>>>>> philosophical it would get. Probably better to have started there and >>>>>> then migrate here if it was philosophical. Feel free to move it to >>>>>> programming since we're now on the details of the sequential machine >>>>>> implementation. >>>>>> >>>>>> Thanks again. I appreciate the opportunity to learn. >>>>>> >>>>>> On Sat, Jan 11, 2014 at 10:16 PM, Raul Miller <[email protected]> >>>>>> wrote: >>>>>>> Here's a draft that uses ;: >>>>>>> >>>>>>> https://gist.github.com/rdm/8380234 >>>>>>> >>>>>>> (As an aside, perhaps this thread should be on programming? Or at >>>>>>> least, something to think about for next time...) >>>>>>> >>>>>>> Note that I get different character counts than you. Maybe I >>>>>>> misunderstood what you intended to count? >>>>>>> >>>>>>> Let me know if you want me to clarify or rewrite any of that. >>>>>>> >>>>>>> But, briefly, I am using the final states from a ;: trace to mark the >>>>>>> end of each "token" and then classifying the text based on that >>>>>>> analysis. Since this sequential machine is a bit bulky, I decided to >>>>>>> write a small application to build it rather than constructing it by >>>>>>> hand. Since I only care about the state trace, I use no-op for all >>>>>>> operations. Since I want the end state, I use 0 _1 0 0 for ijrd >>>>>>> instead of the default 0 _1 0 _1. This leaves me with my final state >>>>>>> being the "character position" after the last character in text (and >>>>>>> it's reported in the trace rather than being an error condition). >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> -- >>>>>>> Raul >>>>>>> >>>>>>> On Sat, Jan 11, 2014 at 4:47 PM, Joe Bogner <[email protected]> wrote: >>>>>>>> Thank you for the thoughts. You summarized it well. >>>>>>>> >>>>>>>> I don't need to worry about attributes on the script tag for this use >>>>>>>> case. >>>>>>>> I am interested in quantifying how much embedded javascript is in each >>>>>>>> of >>>>>>>> the pages. I don't need to quantify external scripts. I know the code >>>>>>>> base >>>>>>>> doesnt use the type="javascript" attribute >>>>>>>> >>>>>>>> The braces should be well formed otherwise the c# razor file wouldn't >>>>>>>> compile. It is possible there may be an edgecase which can be found >>>>>>>> when I >>>>>>>> run it against all the files. >>>>>>>> >>>>>>>> I plan to use it to identify areas to refactor in the javascript/c# >>>>>>>> razor >>>>>>>> code base and then watch it improve over time. I also thought it would >>>>>>>> be >>>>>>>> interesting to use a concise and expressive language, J, to measure the >>>>>>>> more verbose code base. It doesn’t need to be precise in terms of >>>>>>>> characters. For example, it is ok if the script tag characters are >>>>>>>> counted >>>>>>>> as long as its consistent. I will be using it find large problem areas >>>>>>>> and >>>>>>>> then measure the improvement. >>>>>>>> >>>>>>>> I would be interested in seeing the sequential machine approach or any >>>>>>>> other more idiomatic method than mine. I am fairly satisfied with >>>>>>>> mine. It >>>>>>>> is fairly clear to me and can likely ne extended if needed. I am >>>>>>>> trying to >>>>>>>> use J more in my day to day and that would help me learn and hopefully >>>>>>>> would be an interesting example for others. >>>>>>>> >>>>>>>> Thanks again >>>>>>>> On Jan 11, 2014 4:11 PM, "Raul Miller" <[email protected]> wrote: >>>>>>>> >>>>>>>>> I think I see how I would do that with a sequential machine. Let me >>>>>>>>> know if you want a working example. >>>>>>>>> >>>>>>>>> Briefly, though, you seem to have three kinds of token pairs: >>>>>>>>> >>>>>>>>> @{ } >>>>>>>>> { } >>>>>>>>> <script> </script> >>>>>>>>> >>>>>>>>> The ambiguity between the first two is problematic, in the context of >>>>>>>>> errors, but does not matter in well formed cases. A bigger problem in >>>>>>>>> the wild might be that you do not allow for attributes on the script >>>>>>>>> tag. >>>>>>>>> >>>>>>>>> Also, you care about the number of characters between <script> >>>>>>>>> </script> so those characters should be saved as "tokens" even if they >>>>>>>>> are not curly braces. You care about {} between both @{ } and <script> >>>>>>>>> </script> and outside them, and your implementation allows things like >>>>>>>>> @{ <script> } </script>. >>>>>>>>> >>>>>>>>> A full wart-for-wart compatible version would be painful to write. A >>>>>>>>> version which assumed well-formed cases would be much easier to write. >>>>>>>>> But before thinking about coding up an implementation it's probably >>>>>>>>> worth thinking about why you want to do this. The answer to that kind >>>>>>>>> of question can be really interesting and can help identify which >>>>>>>>> warts are unnecessary or possibly even detrimental. >>>>>>>>> >>>>>>>>> So, before I think any more about code, what are your thoughts on what >>>>>>>>> you want to accomplish? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Raul >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sat, Jan 11, 2014 at 3:40 PM, Joe Bogner <[email protected]> >>>>>>>>> wrote: >>>>>>>>> > I have about 300 code files (javascript and embedded code) that I >>>>>>>>> > want >>>>>>>>> > to collect some metrics on. I've written the algorithm using an >>>>>>>>> > imperative style. I actually wrote it first in C# and translated to >>>>>>>>> > J >>>>>>>>> > >>>>>>>>> > Here is the code (posted a link for brevity): >>>>>>>>> > >>>>>>>>> > J version: >>>>>>>>> > https://gist.github.com/joebo/936ca5e2017c0a3b5c56 >>>>>>>>> > >>>>>>>>> > C# version: >>>>>>>>> > https://gist.github.com/joebo/e7f8e3ca7bd21117e58d >>>>>>>>> > >>>>>>>>> > This is what it outputs >>>>>>>>> > >>>>>>>>> > calc'' >>>>>>>>> > blocks 3 >>>>>>>>> > max depth 2 >>>>>>>>> > max block 113 >>>>>>>>> > scripts 2 >>>>>>>>> > max script 26 >>>>>>>>> > >>>>>>>>> > Any suggestions on how to do it differently in J? I looked into the >>>>>>>>> > sequential machine some but couldn't figure out how to make it work >>>>>>>>> > (if it could) since my approach required knowledge of the brace >>>>>>>>> > depth. >>>>>>>>> > >>>>>>>>> > In terms of requirements: >>>>>>>>> > 1. Take a block of text >>>>>>>>> > 2. Identify the code blocks in the file (start with @{ and end with >>>>>>>>> > } ) >>>>>>>>> > 3. Count the code blocks >>>>>>>>> > 4. Determine the max depth of the code block >>>>>>>>> > 5. Determine the max size of all the code blocks >>>>>>>>> > 6. Count the javascript blocks >>>>>>>>> > 7. Determine the max size of the javascript block >>>>>>>>> > >>>>>>>>> > Thanks for any feedback or input! >>>>>>>>> > >>>>>>>>> > Joe >>>>>>>>> > ---------------------------------------------------------------------- >>>>>>>>> > For information about J forums see >>>>>>>>> > http://www.jsoftware.com/forums.htm >>>>>>>>> ---------------------------------------------------------------------- >>>>>>>>> For information about J forums see http://www.jsoftware.com/forums.htm >>>>>>>>> >>>>>>>> ---------------------------------------------------------------------- >>>>>>>> For information about J forums see http://www.jsoftware.com/forums.htm >>>>>>> ---------------------------------------------------------------------- >>>>>>> For information about J forums see http://www.jsoftware.com/forums.htm >>>>>> ---------------------------------------------------------------------- >>>>>> For information about J forums see http://www.jsoftware.com/forums.htm >>>>> ---------------------------------------------------------------------- >>>>> For information about J forums see http://www.jsoftware.com/forums.htm >>>> ---------------------------------------------------------------------- >>>> For information about J forums see http://www.jsoftware.com/forums.htm >>> ---------------------------------------------------------------------- >>> For information about J forums see http://www.jsoftware.com/forums.htm >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
