1234 in every row is *NOT* necessary though, ... (I thought I had corrected that before hitting send. I get too impatient, when proofreading my own writing.)
Thanks, -- Raul On Sun, Jan 12, 2014 at 2:58 PM, Raul Miller <[email protected]> wrote: > Yes, nicely done. > > And, really, 1234 could appear in every row of the States table. Right > now it's only there for where tokens end, and that works because > you're not going to be using malformed or partially formed script > tags. 1234 in every row is necessary though, for your current examples > and requirements. > > Thanks, > > -- > Raul > > On Sun, Jan 12, 2014 at 2:48 PM, Joe Bogner <[email protected]> wrote: >> I think I have the state table figured out. I created a little image >> to help explain it. >> >> http://imgur.com/JT7MAhj >> >> I may post this to the wiki as an example. Thanks again >> >> >> >> On Sun, Jan 12, 2014 at 1:49 PM, Joe Bogner <[email protected]> wrote: >>> Great! Yes, that change counts the blocks the way I need it to. >>> >>> As you pointed out, the requirements weren't very well spec'd and led >>> to ambiguities. I had thought the implementation was relatively clear >>> and it wasn't a good assumption to think that it would be read. I >>> think tests would have worked well to more clearly illustrate the >>> expectations. I couldn't imagine writing a parser without them. >>> >>> Raul wrote: >>>> A related issue is that your measure of "depth" did not include @{ } - >>>> both immediately inside and immediately outside these tokens is depth >>>> 0, if I understand properly what you meant by depth. >>> >>> Yes, that's fine. My goal was to measure how deep the nesting was. It >>> doesn't matter to me if its zero or one based. >>> >>> Thanks for the tips on how to incorporate logic for detecting in { } in >>> strings. >>> >>> We actually weren't that far off on understanding considering how >>> little you needed to change to count blocks the way I had intended. >>> >>> The max blocks still isn't right, but that's OK. I will see if I can >>> fix it or start writing some tests to demonstrate it better. >>> >>> text =: 0 : 0 >>> @{ if (foo) { } } >>> ) >>> >>> shows max block 2 >>> >>> text =: 0 : 0 >>> @{ Response.Write("hi"); } >>> ) >>> >>> shows max block 0, which leads me to believe it needs a brace inside >>> the code block to start counting. I would have assumed it would be >>> some number of characters close to # ' Response.Write("hi"); ' >>> >>> 23 >>> >>> I should be able to figure out most of how you did it but I'm stumped >>> on the State table. I think I understand classify except for one part >>> (summarized below) >>> >>> >>> classify=: 1}.Ends i. 2 {"1(5;(States,"+0);((<"+Chars),<a.-.Chars);0 >>> _1 0 0) ;: ] >>> >>> Working right to left >>> >>> NB. I can figure this out later. Your explanation is good and the >>> dictionary covers it >>> ijrd =. 0 _1 0 0 >>> >>> NB. The first part makes sense, the second part looks to be bunch of >>> junk characters? >>> m=.((<"+Chars),<a.-.Chars) >>> >>> In a trivial example, it looks like it classifies the rows the same way: >>> >>> m=.((<"+Chars)) >>> (y i.~;m) { (#m),~(#&>m)#i.#m NB. From the dictionary entry >>> >>> 0 1 12 12 12 5 9 12 12 5 12 12 12 7 8 10 12 12 12 12 8 12 12 12 12 2 12 >>> >>> m=.((<"+Chars),<a.-.Chars) >>> (y i.~;m) { (#m),~(#&>m)#i.#m NB. From the dictionary entry >>> >>> 0 1 12 12 12 5 9 12 12 5 12 12 12 7 8 10 12 12 12 12 8 12 12 12 12 2 12 >>> >>> What is the purpose of <a.-.Chars? Is that "every other character" >>> than what was specified? >>> >>> s=:(States,"+0) >>> >>> This adds the 0 operation to each of the States per your earlier note >>> of using 0 to no-op >>> >>> f=: 5 >>> (f;s;m;ijrd) ;: text >>> >>> >>> NB. extract the 3rd column from the trace >>> cols=:2 {"1(f;s;m;ijrd) ;: text >>> 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 >>> >>> NB. turn the col back into the token number from appendToken >>> Ends i. cols >>> >>> >>> I think I understand all of that. >>> >>> The State table has a shape of 21 13, which is the length of the text >>> token we are looking for on the rows and length of the characters that >>> make up those tokens on the columns. >>> >>> # '@{}<script></script>' >>> 20 >>> >>> # Chars >>> 12 >>> >>> I took a stab at adding the character and token on the x & y axis. I >>> don't think I have it lined up quite right and I'm sure it doesn't >>> look great on e-mail. If you can help decrypt the table that would be >>> helpful as I am not following completely what appendToken is doing to >>> build it. >>> >>> >>> >>> @ { } < / s c r i p t > >>> @ 1 2 3 4 0 0 0 0 0 0 0 0 0 >>> { 1 2 3 4 0 0 0 0 0 0 0 0 0 >>> } 1 2 3 4 0 0 0 0 0 0 0 0 0 >>> < 1 2 3 4 0 0 0 0 0 0 0 0 0 >>> s 0 0 0 0 13 5 0 0 0 0 0 0 0 >>> c 0 0 0 0 0 0 6 0 0 0 0 0 0 >>> r 0 0 0 0 0 0 0 7 0 0 0 0 0 >>> i 0 0 0 0 0 0 0 0 8 0 0 0 0 >>> p 0 0 0 0 0 0 0 0 0 9 0 0 0 >>> t 0 0 0 0 0 0 0 0 0 0 10 0 0 >>>> 0 0 0 0 0 0 0 0 0 0 0 11 0 >>> < 1 2 3 4 0 0 0 0 0 0 0 0 0 >>> / 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> s 0 0 0 0 0 14 0 0 0 0 0 0 0 >>> c 0 0 0 0 0 0 15 0 0 0 0 0 0 >>> r 0 0 0 0 0 0 0 16 0 0 0 0 0 >>> i 0 0 0 0 0 0 0 0 17 0 0 0 0 >>> p 0 0 0 0 0 0 0 0 0 18 0 0 0 >>> t 0 0 0 0 0 0 0 0 0 0 19 0 0 >>>> 0 0 0 0 0 0 0 0 0 0 0 20 0 >>> 1 2 3 4 0 0 0 0 0 0 0 0 0 >>> >>> States 5,6,7,8,9,10,11 must be used to track <script> or >>> 14,15,16,17,18,19,20 does it. >>> >>> Not sure why there isn't a state 12 >>> >>> Any guidance on the table would be appreciated. This is really cool. >>> Thanks again >>> >>> >>> On Sun, Jan 12, 2014 at 12:16 PM, Raul Miller <[email protected]> wrote: >>>> Ok... >>>> >>>> Translating what I think you are saying into implementation, I think >>>> you want to change >>>> >>>> smoutput 'blocks ',":Left -&(+/) Codes >>>> >>>> to >>>> >>>> smoutput 'blocks ',":+/ Codes >>>> >>>> "Left" was bits marking left curly brackets (there were 6 in your >>>> sample text) while Codes was bits marking instances of @{ (there were >>>> 3 in your sample text). >>>> >>>> A related issue is that your measure of "depth" did not include @{ } - >>>> both immediately inside and immediately outside these tokens is depth >>>> 0, if I understand properly what you meant by depth. >>>> >>>> You will note, here, that I did not actually read your code very >>>> closely - that is because I was more interested in paraphrasing it >>>> than in copying it, and that means understanding what you were >>>> thinking more than understand what you implemented. We sometimes >>>> approximate this process using requirements, sometimes using tests and >>>> perhaps in a variety of other ways. >>>> >>>> Also, it might help you to understand the code better if you replaced >>>> every =. in calc2 with =: (=. is great for isolating internal >>>> definitions in explicit verbs, but =: is much better for making things >>>> visible or ... explicit?). >>>> >>>> That said, we can exclude { and } which appear in irrelevant contexts >>>> by first declaring what those contexts are (double quoted strings? >>>> multi-line comments? single line comments?) and then adjusting the >>>> definition of State to distinguish them from the recognized instances >>>> of { and }. >>>> >>>> Let's say that I wanted to exclude { in double quoted strings. Here's >>>> an outline: >>>> >>>> (1) Include " in the definition of Chars >>>> (2) Introduce a new routine appendTokenPair which works like >>>> appendToken but leaves the sequential machine in an alternate state >>>> until receiving a second token. >>>> (3) use this new routine to include " ... " in our definition of State. >>>> >>>> Once this was working, using it for /* ... */ should be trivial, >>>> though the use of multi-character tokens might be an issue, depending >>>> on how appendTokenPair was implemented. >>>> >>>> The thing you need to watch out for, when working with parsers, is >>>> ambiguities. In this example, we had an ambiguity between @{ and { >>>> where hypothetically speaking they might be confused. This was one of >>>> my motivations for focusing on requirements instead of simply diving >>>> into the implementation. >>>> >>>> Being able to move from implementation to specification is not easy - >>>> I love focusing on the computer and I sometimes find human >>>> interactions painful (I do not like bothering people and while I might >>>> occasionally enjoy getting yelled at I find I need to do something to >>>> please people yelling at me after - or at least something I am >>>> comfortable interpreting as pleasing - going away seems to count, >>>> somehow. Mostly, though, I have a lot of respect for heads-down focus, >>>> even when it's taken too far.) >>>> >>>> Does this make sense? >>>> >>>> Thanks, >>>> >>>> -- >>>> Raul >>>> >>>> >>>> >>>> On Sun, Jan 12, 2014 at 11:13 AM, Joe Bogner <[email protected]> wrote: >>>>> Sorry about that. My requirements were based on more contextual >>>>> knowledge than it probably should have. >>>>> >>>>> To take a step back: >>>>> >>>>> In the the c#/razor template language, each code block is delimited by: >>>>> >>>>> @{ >>>>> >>>>> } >>>>> >>>>> Within a block, you can add c# code to perform any functions of your >>>>> page necessary >>>>> >>>>> @{ >>>>> if (Post) { >>>>> Save(); >>>>> } else { >>>>> DoSomethingElse(); >>>>> } >>>>> } >>>>> >>>>> A page can have multiple code blocks. And a code block can have an >>>>> infinite depth of branching, denoted by { } >>>>> >>>>> Poor code would have many blocks, or very large blocks or very deep >>>>> nesting. >>>>> >>>>> @{ >>>>> if (Post) { >>>>> if (Monday) { >>>>> if (After5PM) { >>>>> if (Before8PM) { >>>>> Save(); >>>>> } >>>>> } >>>>> } >>>>> >>>>> } else { >>>>> DoSomethingElse(); >>>>> } >>>>> } >>>>> >>>>> >>>>> A code block is pairs of @{ } where } terminates after the branch >>>>> level is zero. Let me know if that's not clear enough. Other >>>>> templating languages like php make it easier. >>>>> >>>>> <? php >>>>> >>>>> if (Foo) { >>>>> >>>>> } >>>>> >>>>> ?> >>>>> <html>foo</html> >>>>> >>>>> >>>>> In PHP, you wouldn't need to worry about the curly brace depth for >>>>> determining code block start and end. It could be split on <?php ?> >>>>> >>>>> In razor, the @{ is the same as <? and the } when brace depth is zero >>>>> terminates the block >>>>> >>>>> So I don't have an exact specification that I'm working towards. I'm >>>>> just trying to find out how many @{ } code blocks there are, how >>>>> deeply nested the code within is, and how large the largest block is. >>>>> For example, if it's more than 20 lines or X characters, it probably >>>>> belongs in a separate class or file >>>>> >>>>> Of course an edge case that would blow up would be if the code block >>>>> has a brace in a string >>>>> >>>>> @{ >>>>> if (Post) { >>>>> Response.Write("will break a simple parser } } }} "); >>>>> } >>>>> } >>>>> I don't think that would be extensive in this code. It's not going to >>>>> be used for anything of a critical nature other than to help improve >>>>> my personal code base - so if there are false positives or errors it >>>>> OK. I'm looking for a "good enough" solution. >>>>> >>>>> Hope that helps. Feel free to cancel if I'm not getting progressively >>>>> more clear or if the problem is uninteresting to help solve. >>>>> >>>>> Thanks again >>>>> >>>>> Joe >>>>> >>>>> >>>>> >>>>> On Sun, Jan 12, 2014 at 10:56 AM, Raul Miller <[email protected]> >>>>> wrote: >>>>>> I quite possibly misunderstood your specifications. >>>>>> >>>>>> If I simply remove lines 2 and 11 from my gist, calc2 still reports >>>>>> three blocks. If I also remove the three blocks which appear between >>>>>> lines 2 and 11, calc2 will then report 0 blocks. Is that not what you >>>>>> wanted me to count? >>>>>> >>>>>> Meanwhile, I do not concern myself very much with whether the >>>>>> boundaries of a region of text are "inside" or "outside" that region. >>>>>> Instead, I go with what seems simple to implement and then use the >>>>>> requirements to tweak the code so that the result is correct. Of >>>>>> course, the limitation here is that I need to understand your >>>>>> requirements. Another limitation is that new requirements will require >>>>>> new code (or manual work) - but that seems to me to be unavoidable. >>>>>> >>>>>> I expect that once we share an understanding of your requirements that >>>>>> an explanation of how the code is structured will make more sense. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> -- >>>>>> Raul >>>>>> >>>>>> >>>>>> On Sun, Jan 12, 2014 at 6:50 AM, Joe Bogner <[email protected]> wrote: >>>>>>> Thanks for the sequential machine implementation. I tested with >>>>>>> different versions of the text block and it doesn't work as I >>>>>>> expected, which means I either relayed the requirements wrong or there >>>>>>> may be a bug >>>>>>> >>>>>>> For example, if I take out the first block of @{ }, it reports >>>>>>> >>>>>>> calc2 text >>>>>>> blocks 0 >>>>>>> max depth 1 >>>>>>> max block 25 >>>>>>> scripts 2 >>>>>>> max script 49 >>>>>>> >>>>>>> text =: 0 : 0 >>>>>>> @{ >>>>>>> Response.Write('start'); >>>>>>> } >>>>>>> <html> >>>>>>> <script> >>>>>>> alert('start'); >>>>>>> </script> >>>>>>> <div id='Foo'>@Page.Foo</div> >>>>>>> <script> >>>>>>> alert($('#Foo').val()); >>>>>>> </script> >>>>>>> >>>>>>> </html> >>>>>>> @{ >>>>>>> Response.Write('bye'); >>>>>>> } >>>>>>> ) >>>>>>> >>>>>>> My implementation posts the correct answer of two blocks - each pair >>>>>>> of @{ and the } that gets back to indent = 0. >>>>>>> >>>>>>> It looks like yours requires possibly a brace in the block to trigger >>>>>>> it as a code block. It also seems to be summing up the total amount >>>>>>> of code and script characters instead of finding the largest one. >>>>>>> >>>>>>> The Trace looks helpful to debug. >>>>>>> >>>>>>> I've read through the dictionary and nuvoc a few times for sequential >>>>>>> machine and I don't understand it well enough to help troubleshoot >>>>>>> your implementation. I'll spend more time with it. I didn't want to go >>>>>>> down that rabbit hole until I was sure it could provide a correct >>>>>>> result. >>>>>>> >>>>>>> I thought about posting to programming but was't sure how >>>>>>> philosophical it would get. Probably better to have started there and >>>>>>> then migrate here if it was philosophical. Feel free to move it to >>>>>>> programming since we're now on the details of the sequential machine >>>>>>> implementation. >>>>>>> >>>>>>> Thanks again. I appreciate the opportunity to learn. >>>>>>> >>>>>>> On Sat, Jan 11, 2014 at 10:16 PM, Raul Miller <[email protected]> >>>>>>> wrote: >>>>>>>> Here's a draft that uses ;: >>>>>>>> >>>>>>>> https://gist.github.com/rdm/8380234 >>>>>>>> >>>>>>>> (As an aside, perhaps this thread should be on programming? Or at >>>>>>>> least, something to think about for next time...) >>>>>>>> >>>>>>>> Note that I get different character counts than you. Maybe I >>>>>>>> misunderstood what you intended to count? >>>>>>>> >>>>>>>> Let me know if you want me to clarify or rewrite any of that. >>>>>>>> >>>>>>>> But, briefly, I am using the final states from a ;: trace to mark the >>>>>>>> end of each "token" and then classifying the text based on that >>>>>>>> analysis. Since this sequential machine is a bit bulky, I decided to >>>>>>>> write a small application to build it rather than constructing it by >>>>>>>> hand. Since I only care about the state trace, I use no-op for all >>>>>>>> operations. Since I want the end state, I use 0 _1 0 0 for ijrd >>>>>>>> instead of the default 0 _1 0 _1. This leaves me with my final state >>>>>>>> being the "character position" after the last character in text (and >>>>>>>> it's reported in the trace rather than being an error condition). >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> -- >>>>>>>> Raul >>>>>>>> >>>>>>>> On Sat, Jan 11, 2014 at 4:47 PM, Joe Bogner <[email protected]> >>>>>>>> wrote: >>>>>>>>> Thank you for the thoughts. You summarized it well. >>>>>>>>> >>>>>>>>> I don't need to worry about attributes on the script tag for this use >>>>>>>>> case. >>>>>>>>> I am interested in quantifying how much embedded javascript is in >>>>>>>>> each of >>>>>>>>> the pages. I don't need to quantify external scripts. I know the code >>>>>>>>> base >>>>>>>>> doesnt use the type="javascript" attribute >>>>>>>>> >>>>>>>>> The braces should be well formed otherwise the c# razor file wouldn't >>>>>>>>> compile. It is possible there may be an edgecase which can be found >>>>>>>>> when I >>>>>>>>> run it against all the files. >>>>>>>>> >>>>>>>>> I plan to use it to identify areas to refactor in the javascript/c# >>>>>>>>> razor >>>>>>>>> code base and then watch it improve over time. I also thought it >>>>>>>>> would be >>>>>>>>> interesting to use a concise and expressive language, J, to measure >>>>>>>>> the >>>>>>>>> more verbose code base. It doesn’t need to be precise in terms of >>>>>>>>> characters. For example, it is ok if the script tag characters are >>>>>>>>> counted >>>>>>>>> as long as its consistent. I will be using it find large problem >>>>>>>>> areas and >>>>>>>>> then measure the improvement. >>>>>>>>> >>>>>>>>> I would be interested in seeing the sequential machine approach or any >>>>>>>>> other more idiomatic method than mine. I am fairly satisfied with >>>>>>>>> mine. It >>>>>>>>> is fairly clear to me and can likely ne extended if needed. I am >>>>>>>>> trying to >>>>>>>>> use J more in my day to day and that would help me learn and hopefully >>>>>>>>> would be an interesting example for others. >>>>>>>>> >>>>>>>>> Thanks again >>>>>>>>> On Jan 11, 2014 4:11 PM, "Raul Miller" <[email protected]> wrote: >>>>>>>>> >>>>>>>>>> I think I see how I would do that with a sequential machine. Let me >>>>>>>>>> know if you want a working example. >>>>>>>>>> >>>>>>>>>> Briefly, though, you seem to have three kinds of token pairs: >>>>>>>>>> >>>>>>>>>> @{ } >>>>>>>>>> { } >>>>>>>>>> <script> </script> >>>>>>>>>> >>>>>>>>>> The ambiguity between the first two is problematic, in the context of >>>>>>>>>> errors, but does not matter in well formed cases. A bigger problem in >>>>>>>>>> the wild might be that you do not allow for attributes on the script >>>>>>>>>> tag. >>>>>>>>>> >>>>>>>>>> Also, you care about the number of characters between <script> >>>>>>>>>> </script> so those characters should be saved as "tokens" even if >>>>>>>>>> they >>>>>>>>>> are not curly braces. You care about {} between both @{ } and >>>>>>>>>> <script> >>>>>>>>>> </script> and outside them, and your implementation allows things >>>>>>>>>> like >>>>>>>>>> @{ <script> } </script>. >>>>>>>>>> >>>>>>>>>> A full wart-for-wart compatible version would be painful to write. A >>>>>>>>>> version which assumed well-formed cases would be much easier to >>>>>>>>>> write. >>>>>>>>>> But before thinking about coding up an implementation it's probably >>>>>>>>>> worth thinking about why you want to do this. The answer to that kind >>>>>>>>>> of question can be really interesting and can help identify which >>>>>>>>>> warts are unnecessary or possibly even detrimental. >>>>>>>>>> >>>>>>>>>> So, before I think any more about code, what are your thoughts on >>>>>>>>>> what >>>>>>>>>> you want to accomplish? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Raul >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sat, Jan 11, 2014 at 3:40 PM, Joe Bogner <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> > I have about 300 code files (javascript and embedded code) that I >>>>>>>>>> > want >>>>>>>>>> > to collect some metrics on. I've written the algorithm using an >>>>>>>>>> > imperative style. I actually wrote it first in C# and translated >>>>>>>>>> > to J >>>>>>>>>> > >>>>>>>>>> > Here is the code (posted a link for brevity): >>>>>>>>>> > >>>>>>>>>> > J version: >>>>>>>>>> > https://gist.github.com/joebo/936ca5e2017c0a3b5c56 >>>>>>>>>> > >>>>>>>>>> > C# version: >>>>>>>>>> > https://gist.github.com/joebo/e7f8e3ca7bd21117e58d >>>>>>>>>> > >>>>>>>>>> > This is what it outputs >>>>>>>>>> > >>>>>>>>>> > calc'' >>>>>>>>>> > blocks 3 >>>>>>>>>> > max depth 2 >>>>>>>>>> > max block 113 >>>>>>>>>> > scripts 2 >>>>>>>>>> > max script 26 >>>>>>>>>> > >>>>>>>>>> > Any suggestions on how to do it differently in J? I looked into the >>>>>>>>>> > sequential machine some but couldn't figure out how to make it work >>>>>>>>>> > (if it could) since my approach required knowledge of the brace >>>>>>>>>> > depth. >>>>>>>>>> > >>>>>>>>>> > In terms of requirements: >>>>>>>>>> > 1. Take a block of text >>>>>>>>>> > 2. Identify the code blocks in the file (start with @{ and end >>>>>>>>>> > with } ) >>>>>>>>>> > 3. Count the code blocks >>>>>>>>>> > 4. Determine the max depth of the code block >>>>>>>>>> > 5. Determine the max size of all the code blocks >>>>>>>>>> > 6. Count the javascript blocks >>>>>>>>>> > 7. Determine the max size of the javascript block >>>>>>>>>> > >>>>>>>>>> > Thanks for any feedback or input! >>>>>>>>>> > >>>>>>>>>> > Joe >>>>>>>>>> > ---------------------------------------------------------------------- >>>>>>>>>> > For information about J forums see >>>>>>>>>> > http://www.jsoftware.com/forums.htm >>>>>>>>>> ---------------------------------------------------------------------- >>>>>>>>>> For information about J forums see >>>>>>>>>> http://www.jsoftware.com/forums.htm >>>>>>>>>> >>>>>>>>> ---------------------------------------------------------------------- >>>>>>>>> For information about J forums see http://www.jsoftware.com/forums.htm >>>>>>>> ---------------------------------------------------------------------- >>>>>>>> For information about J forums see http://www.jsoftware.com/forums.htm >>>>>>> ---------------------------------------------------------------------- >>>>>>> For information about J forums see http://www.jsoftware.com/forums.htm >>>>>> ---------------------------------------------------------------------- >>>>>> For information about J forums see http://www.jsoftware.com/forums.htm >>>>> ---------------------------------------------------------------------- >>>>> For information about J forums see http://www.jsoftware.com/forums.htm >>>> ---------------------------------------------------------------------- >>>> For information about J forums see http://www.jsoftware.com/forums.htm >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
