I think I have the state table figured out. I created a little image to help explain it.
http://imgur.com/JT7MAhj I may post this to the wiki as an example. Thanks again On Sun, Jan 12, 2014 at 1:49 PM, Joe Bogner <[email protected]> wrote: > Great! Yes, that change counts the blocks the way I need it to. > > As you pointed out, the requirements weren't very well spec'd and led > to ambiguities. I had thought the implementation was relatively clear > and it wasn't a good assumption to think that it would be read. I > think tests would have worked well to more clearly illustrate the > expectations. I couldn't imagine writing a parser without them. > > Raul wrote: >> A related issue is that your measure of "depth" did not include @{ } - >> both immediately inside and immediately outside these tokens is depth >> 0, if I understand properly what you meant by depth. > > Yes, that's fine. My goal was to measure how deep the nesting was. It > doesn't matter to me if its zero or one based. > > Thanks for the tips on how to incorporate logic for detecting in { } in > strings. > > We actually weren't that far off on understanding considering how > little you needed to change to count blocks the way I had intended. > > The max blocks still isn't right, but that's OK. I will see if I can > fix it or start writing some tests to demonstrate it better. > > text =: 0 : 0 > @{ if (foo) { } } > ) > > shows max block 2 > > text =: 0 : 0 > @{ Response.Write("hi"); } > ) > > shows max block 0, which leads me to believe it needs a brace inside > the code block to start counting. I would have assumed it would be > some number of characters close to # ' Response.Write("hi"); ' > > 23 > > I should be able to figure out most of how you did it but I'm stumped > on the State table. I think I understand classify except for one part > (summarized below) > > > classify=: 1}.Ends i. 2 {"1(5;(States,"+0);((<"+Chars),<a.-.Chars);0 > _1 0 0) ;: ] > > Working right to left > > NB. I can figure this out later. Your explanation is good and the > dictionary covers it > ijrd =. 0 _1 0 0 > > NB. The first part makes sense, the second part looks to be bunch of > junk characters? > m=.((<"+Chars),<a.-.Chars) > > In a trivial example, it looks like it classifies the rows the same way: > > m=.((<"+Chars)) > (y i.~;m) { (#m),~(#&>m)#i.#m NB. From the dictionary entry > > 0 1 12 12 12 5 9 12 12 5 12 12 12 7 8 10 12 12 12 12 8 12 12 12 12 2 12 > > m=.((<"+Chars),<a.-.Chars) > (y i.~;m) { (#m),~(#&>m)#i.#m NB. From the dictionary entry > > 0 1 12 12 12 5 9 12 12 5 12 12 12 7 8 10 12 12 12 12 8 12 12 12 12 2 12 > > What is the purpose of <a.-.Chars? Is that "every other character" > than what was specified? > > s=:(States,"+0) > > This adds the 0 operation to each of the States per your earlier note > of using 0 to no-op > > f=: 5 > (f;s;m;ijrd) ;: text > > > NB. extract the 3rd column from the trace > cols=:2 {"1(f;s;m;ijrd) ;: text > 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 > > NB. turn the col back into the token number from appendToken > Ends i. cols > > > I think I understand all of that. > > The State table has a shape of 21 13, which is the length of the text > token we are looking for on the rows and length of the characters that > make up those tokens on the columns. > > # '@{}<script></script>' > 20 > > # Chars > 12 > > I took a stab at adding the character and token on the x & y axis. I > don't think I have it lined up quite right and I'm sure it doesn't > look great on e-mail. If you can help decrypt the table that would be > helpful as I am not following completely what appendToken is doing to > build it. > > > > @ { } < / s c r i p t > > @ 1 2 3 4 0 0 0 0 0 0 0 0 0 > { 1 2 3 4 0 0 0 0 0 0 0 0 0 > } 1 2 3 4 0 0 0 0 0 0 0 0 0 > < 1 2 3 4 0 0 0 0 0 0 0 0 0 > s 0 0 0 0 13 5 0 0 0 0 0 0 0 > c 0 0 0 0 0 0 6 0 0 0 0 0 0 > r 0 0 0 0 0 0 0 7 0 0 0 0 0 > i 0 0 0 0 0 0 0 0 8 0 0 0 0 > p 0 0 0 0 0 0 0 0 0 9 0 0 0 > t 0 0 0 0 0 0 0 0 0 0 10 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 11 0 > < 1 2 3 4 0 0 0 0 0 0 0 0 0 > / 0 0 0 0 0 0 0 0 0 0 0 0 0 > s 0 0 0 0 0 14 0 0 0 0 0 0 0 > c 0 0 0 0 0 0 15 0 0 0 0 0 0 > r 0 0 0 0 0 0 0 16 0 0 0 0 0 > i 0 0 0 0 0 0 0 0 17 0 0 0 0 > p 0 0 0 0 0 0 0 0 0 18 0 0 0 > t 0 0 0 0 0 0 0 0 0 0 19 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 20 0 > 1 2 3 4 0 0 0 0 0 0 0 0 0 > > States 5,6,7,8,9,10,11 must be used to track <script> or > 14,15,16,17,18,19,20 does it. > > Not sure why there isn't a state 12 > > Any guidance on the table would be appreciated. This is really cool. > Thanks again > > > On Sun, Jan 12, 2014 at 12:16 PM, Raul Miller <[email protected]> wrote: >> Ok... >> >> Translating what I think you are saying into implementation, I think >> you want to change >> >> smoutput 'blocks ',":Left -&(+/) Codes >> >> to >> >> smoutput 'blocks ',":+/ Codes >> >> "Left" was bits marking left curly brackets (there were 6 in your >> sample text) while Codes was bits marking instances of @{ (there were >> 3 in your sample text). >> >> A related issue is that your measure of "depth" did not include @{ } - >> both immediately inside and immediately outside these tokens is depth >> 0, if I understand properly what you meant by depth. >> >> You will note, here, that I did not actually read your code very >> closely - that is because I was more interested in paraphrasing it >> than in copying it, and that means understanding what you were >> thinking more than understand what you implemented. We sometimes >> approximate this process using requirements, sometimes using tests and >> perhaps in a variety of other ways. >> >> Also, it might help you to understand the code better if you replaced >> every =. in calc2 with =: (=. is great for isolating internal >> definitions in explicit verbs, but =: is much better for making things >> visible or ... explicit?). >> >> That said, we can exclude { and } which appear in irrelevant contexts >> by first declaring what those contexts are (double quoted strings? >> multi-line comments? single line comments?) and then adjusting the >> definition of State to distinguish them from the recognized instances >> of { and }. >> >> Let's say that I wanted to exclude { in double quoted strings. Here's >> an outline: >> >> (1) Include " in the definition of Chars >> (2) Introduce a new routine appendTokenPair which works like >> appendToken but leaves the sequential machine in an alternate state >> until receiving a second token. >> (3) use this new routine to include " ... " in our definition of State. >> >> Once this was working, using it for /* ... */ should be trivial, >> though the use of multi-character tokens might be an issue, depending >> on how appendTokenPair was implemented. >> >> The thing you need to watch out for, when working with parsers, is >> ambiguities. In this example, we had an ambiguity between @{ and { >> where hypothetically speaking they might be confused. This was one of >> my motivations for focusing on requirements instead of simply diving >> into the implementation. >> >> Being able to move from implementation to specification is not easy - >> I love focusing on the computer and I sometimes find human >> interactions painful (I do not like bothering people and while I might >> occasionally enjoy getting yelled at I find I need to do something to >> please people yelling at me after - or at least something I am >> comfortable interpreting as pleasing - going away seems to count, >> somehow. Mostly, though, I have a lot of respect for heads-down focus, >> even when it's taken too far.) >> >> Does this make sense? >> >> Thanks, >> >> -- >> Raul >> >> >> >> On Sun, Jan 12, 2014 at 11:13 AM, Joe Bogner <[email protected]> wrote: >>> Sorry about that. My requirements were based on more contextual >>> knowledge than it probably should have. >>> >>> To take a step back: >>> >>> In the the c#/razor template language, each code block is delimited by: >>> >>> @{ >>> >>> } >>> >>> Within a block, you can add c# code to perform any functions of your >>> page necessary >>> >>> @{ >>> if (Post) { >>> Save(); >>> } else { >>> DoSomethingElse(); >>> } >>> } >>> >>> A page can have multiple code blocks. And a code block can have an >>> infinite depth of branching, denoted by { } >>> >>> Poor code would have many blocks, or very large blocks or very deep nesting. >>> >>> @{ >>> if (Post) { >>> if (Monday) { >>> if (After5PM) { >>> if (Before8PM) { >>> Save(); >>> } >>> } >>> } >>> >>> } else { >>> DoSomethingElse(); >>> } >>> } >>> >>> >>> A code block is pairs of @{ } where } terminates after the branch >>> level is zero. Let me know if that's not clear enough. Other >>> templating languages like php make it easier. >>> >>> <? php >>> >>> if (Foo) { >>> >>> } >>> >>> ?> >>> <html>foo</html> >>> >>> >>> In PHP, you wouldn't need to worry about the curly brace depth for >>> determining code block start and end. It could be split on <?php ?> >>> >>> In razor, the @{ is the same as <? and the } when brace depth is zero >>> terminates the block >>> >>> So I don't have an exact specification that I'm working towards. I'm >>> just trying to find out how many @{ } code blocks there are, how >>> deeply nested the code within is, and how large the largest block is. >>> For example, if it's more than 20 lines or X characters, it probably >>> belongs in a separate class or file >>> >>> Of course an edge case that would blow up would be if the code block >>> has a brace in a string >>> >>> @{ >>> if (Post) { >>> Response.Write("will break a simple parser } } }} "); >>> } >>> } >>> I don't think that would be extensive in this code. It's not going to >>> be used for anything of a critical nature other than to help improve >>> my personal code base - so if there are false positives or errors it >>> OK. I'm looking for a "good enough" solution. >>> >>> Hope that helps. Feel free to cancel if I'm not getting progressively >>> more clear or if the problem is uninteresting to help solve. >>> >>> Thanks again >>> >>> Joe >>> >>> >>> >>> On Sun, Jan 12, 2014 at 10:56 AM, Raul Miller <[email protected]> wrote: >>>> I quite possibly misunderstood your specifications. >>>> >>>> If I simply remove lines 2 and 11 from my gist, calc2 still reports >>>> three blocks. If I also remove the three blocks which appear between >>>> lines 2 and 11, calc2 will then report 0 blocks. Is that not what you >>>> wanted me to count? >>>> >>>> Meanwhile, I do not concern myself very much with whether the >>>> boundaries of a region of text are "inside" or "outside" that region. >>>> Instead, I go with what seems simple to implement and then use the >>>> requirements to tweak the code so that the result is correct. Of >>>> course, the limitation here is that I need to understand your >>>> requirements. Another limitation is that new requirements will require >>>> new code (or manual work) - but that seems to me to be unavoidable. >>>> >>>> I expect that once we share an understanding of your requirements that >>>> an explanation of how the code is structured will make more sense. >>>> >>>> Thanks, >>>> >>>> -- >>>> Raul >>>> >>>> >>>> On Sun, Jan 12, 2014 at 6:50 AM, Joe Bogner <[email protected]> wrote: >>>>> Thanks for the sequential machine implementation. I tested with >>>>> different versions of the text block and it doesn't work as I >>>>> expected, which means I either relayed the requirements wrong or there >>>>> may be a bug >>>>> >>>>> For example, if I take out the first block of @{ }, it reports >>>>> >>>>> calc2 text >>>>> blocks 0 >>>>> max depth 1 >>>>> max block 25 >>>>> scripts 2 >>>>> max script 49 >>>>> >>>>> text =: 0 : 0 >>>>> @{ >>>>> Response.Write('start'); >>>>> } >>>>> <html> >>>>> <script> >>>>> alert('start'); >>>>> </script> >>>>> <div id='Foo'>@Page.Foo</div> >>>>> <script> >>>>> alert($('#Foo').val()); >>>>> </script> >>>>> >>>>> </html> >>>>> @{ >>>>> Response.Write('bye'); >>>>> } >>>>> ) >>>>> >>>>> My implementation posts the correct answer of two blocks - each pair >>>>> of @{ and the } that gets back to indent = 0. >>>>> >>>>> It looks like yours requires possibly a brace in the block to trigger >>>>> it as a code block. It also seems to be summing up the total amount >>>>> of code and script characters instead of finding the largest one. >>>>> >>>>> The Trace looks helpful to debug. >>>>> >>>>> I've read through the dictionary and nuvoc a few times for sequential >>>>> machine and I don't understand it well enough to help troubleshoot >>>>> your implementation. I'll spend more time with it. I didn't want to go >>>>> down that rabbit hole until I was sure it could provide a correct >>>>> result. >>>>> >>>>> I thought about posting to programming but was't sure how >>>>> philosophical it would get. Probably better to have started there and >>>>> then migrate here if it was philosophical. Feel free to move it to >>>>> programming since we're now on the details of the sequential machine >>>>> implementation. >>>>> >>>>> Thanks again. I appreciate the opportunity to learn. >>>>> >>>>> On Sat, Jan 11, 2014 at 10:16 PM, Raul Miller <[email protected]> >>>>> wrote: >>>>>> Here's a draft that uses ;: >>>>>> >>>>>> https://gist.github.com/rdm/8380234 >>>>>> >>>>>> (As an aside, perhaps this thread should be on programming? Or at >>>>>> least, something to think about for next time...) >>>>>> >>>>>> Note that I get different character counts than you. Maybe I >>>>>> misunderstood what you intended to count? >>>>>> >>>>>> Let me know if you want me to clarify or rewrite any of that. >>>>>> >>>>>> But, briefly, I am using the final states from a ;: trace to mark the >>>>>> end of each "token" and then classifying the text based on that >>>>>> analysis. Since this sequential machine is a bit bulky, I decided to >>>>>> write a small application to build it rather than constructing it by >>>>>> hand. Since I only care about the state trace, I use no-op for all >>>>>> operations. Since I want the end state, I use 0 _1 0 0 for ijrd >>>>>> instead of the default 0 _1 0 _1. This leaves me with my final state >>>>>> being the "character position" after the last character in text (and >>>>>> it's reported in the trace rather than being an error condition). >>>>>> >>>>>> Thanks, >>>>>> >>>>>> -- >>>>>> Raul >>>>>> >>>>>> On Sat, Jan 11, 2014 at 4:47 PM, Joe Bogner <[email protected]> wrote: >>>>>>> Thank you for the thoughts. You summarized it well. >>>>>>> >>>>>>> I don't need to worry about attributes on the script tag for this use >>>>>>> case. >>>>>>> I am interested in quantifying how much embedded javascript is in each >>>>>>> of >>>>>>> the pages. I don't need to quantify external scripts. I know the code >>>>>>> base >>>>>>> doesnt use the type="javascript" attribute >>>>>>> >>>>>>> The braces should be well formed otherwise the c# razor file wouldn't >>>>>>> compile. It is possible there may be an edgecase which can be found >>>>>>> when I >>>>>>> run it against all the files. >>>>>>> >>>>>>> I plan to use it to identify areas to refactor in the javascript/c# >>>>>>> razor >>>>>>> code base and then watch it improve over time. I also thought it would >>>>>>> be >>>>>>> interesting to use a concise and expressive language, J, to measure the >>>>>>> more verbose code base. It doesn’t need to be precise in terms of >>>>>>> characters. For example, it is ok if the script tag characters are >>>>>>> counted >>>>>>> as long as its consistent. I will be using it find large problem areas >>>>>>> and >>>>>>> then measure the improvement. >>>>>>> >>>>>>> I would be interested in seeing the sequential machine approach or any >>>>>>> other more idiomatic method than mine. I am fairly satisfied with mine. >>>>>>> It >>>>>>> is fairly clear to me and can likely ne extended if needed. I am trying >>>>>>> to >>>>>>> use J more in my day to day and that would help me learn and hopefully >>>>>>> would be an interesting example for others. >>>>>>> >>>>>>> Thanks again >>>>>>> On Jan 11, 2014 4:11 PM, "Raul Miller" <[email protected]> wrote: >>>>>>> >>>>>>>> I think I see how I would do that with a sequential machine. Let me >>>>>>>> know if you want a working example. >>>>>>>> >>>>>>>> Briefly, though, you seem to have three kinds of token pairs: >>>>>>>> >>>>>>>> @{ } >>>>>>>> { } >>>>>>>> <script> </script> >>>>>>>> >>>>>>>> The ambiguity between the first two is problematic, in the context of >>>>>>>> errors, but does not matter in well formed cases. A bigger problem in >>>>>>>> the wild might be that you do not allow for attributes on the script >>>>>>>> tag. >>>>>>>> >>>>>>>> Also, you care about the number of characters between <script> >>>>>>>> </script> so those characters should be saved as "tokens" even if they >>>>>>>> are not curly braces. You care about {} between both @{ } and <script> >>>>>>>> </script> and outside them, and your implementation allows things like >>>>>>>> @{ <script> } </script>. >>>>>>>> >>>>>>>> A full wart-for-wart compatible version would be painful to write. A >>>>>>>> version which assumed well-formed cases would be much easier to write. >>>>>>>> But before thinking about coding up an implementation it's probably >>>>>>>> worth thinking about why you want to do this. The answer to that kind >>>>>>>> of question can be really interesting and can help identify which >>>>>>>> warts are unnecessary or possibly even detrimental. >>>>>>>> >>>>>>>> So, before I think any more about code, what are your thoughts on what >>>>>>>> you want to accomplish? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> -- >>>>>>>> Raul >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Jan 11, 2014 at 3:40 PM, Joe Bogner <[email protected]> >>>>>>>> wrote: >>>>>>>> > I have about 300 code files (javascript and embedded code) that I >>>>>>>> > want >>>>>>>> > to collect some metrics on. I've written the algorithm using an >>>>>>>> > imperative style. I actually wrote it first in C# and translated to J >>>>>>>> > >>>>>>>> > Here is the code (posted a link for brevity): >>>>>>>> > >>>>>>>> > J version: >>>>>>>> > https://gist.github.com/joebo/936ca5e2017c0a3b5c56 >>>>>>>> > >>>>>>>> > C# version: >>>>>>>> > https://gist.github.com/joebo/e7f8e3ca7bd21117e58d >>>>>>>> > >>>>>>>> > This is what it outputs >>>>>>>> > >>>>>>>> > calc'' >>>>>>>> > blocks 3 >>>>>>>> > max depth 2 >>>>>>>> > max block 113 >>>>>>>> > scripts 2 >>>>>>>> > max script 26 >>>>>>>> > >>>>>>>> > Any suggestions on how to do it differently in J? I looked into the >>>>>>>> > sequential machine some but couldn't figure out how to make it work >>>>>>>> > (if it could) since my approach required knowledge of the brace >>>>>>>> > depth. >>>>>>>> > >>>>>>>> > In terms of requirements: >>>>>>>> > 1. Take a block of text >>>>>>>> > 2. Identify the code blocks in the file (start with @{ and end with >>>>>>>> > } ) >>>>>>>> > 3. Count the code blocks >>>>>>>> > 4. Determine the max depth of the code block >>>>>>>> > 5. Determine the max size of all the code blocks >>>>>>>> > 6. Count the javascript blocks >>>>>>>> > 7. Determine the max size of the javascript block >>>>>>>> > >>>>>>>> > Thanks for any feedback or input! >>>>>>>> > >>>>>>>> > Joe >>>>>>>> > ---------------------------------------------------------------------- >>>>>>>> > For information about J forums see >>>>>>>> > http://www.jsoftware.com/forums.htm >>>>>>>> ---------------------------------------------------------------------- >>>>>>>> For information about J forums see http://www.jsoftware.com/forums.htm >>>>>>>> >>>>>>> ---------------------------------------------------------------------- >>>>>>> For information about J forums see http://www.jsoftware.com/forums.htm >>>>>> ---------------------------------------------------------------------- >>>>>> For information about J forums see http://www.jsoftware.com/forums.htm >>>>> ---------------------------------------------------------------------- >>>>> For information about J forums see http://www.jsoftware.com/forums.htm >>>> ---------------------------------------------------------------------- >>>> For information about J forums see http://www.jsoftware.com/forums.htm >>> ---------------------------------------------------------------------- >>> For information about J forums see http://www.jsoftware.com/forums.htm >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
