I've been updating work on my adverb for working with large files - http://www.jsoftware.com/jwiki/DevonMcCormick/Code/WorkOnLargeFiles - that may provide a general way to apply an arbitrary verb across a large file.
On Tue, Feb 3, 2015 at 4:25 PM, Raul Miller <[email protected]> wrote: > Another approach, depending on what the processing looks like, might > involve selecting indices based on something smaller than nine > characters. For example, if certain character pairs were "valid" or > maybe even "better than others" you could get indices which select > those pairs and further refine from there. > > But the blocked approach can also be a very good one. (Just make sure > to handle infixes which span block boundaries.) > > Thanks, > > -- > Raul > > On Tue, Feb 3, 2015 at 3:54 PM, Robert Bernecky > <[email protected]> wrote: > > I agree that you need to have more than one cell on hand to do the nub! > > > > However, try this approach. It's crude, but it might match or beat C: > > > > - decide how much input you can process at once. > > - Result =. '' > > - while more hunks, loop over: > > - hunkresult = nub-of-infix on current hunk of the input. > > - Result = nub Result,hunkresult > > > > This lets J do most of the heavy lifting, while keeping you > > within the memory constraints. > > > > Bob > > > > On 15-02-03 03:45 PM, Joe Bogner wrote: > >> > >> Sorry for the double email, I realized I replied to you directly. I'm > >> replying to you directly since you replied to me directly, just in > >> case you didn't want to send it to chat... > >> > >> In any case, the whole goal is to get the nub of the 9 character > >> infixes. There is special code for some of the mathematical operations > >> on infixes (such as sum) that don't generate all the infixes, however > >> I having found anything for nub (distinct), which makes some sense > >> since it can't be reduced to an atom like the math operations > >> > >> Thanks for the ideas > >> > >> > >> On Tue, Feb 3, 2015 at 3:42 PM, Robert Bernecky > >> <[email protected]> wrote: > >>> > >>> "9 character infixes" - that looks like an input to a convolution, does > >>> it > >>> not? > >>> If so, then there should be some J dohickey (adverb or conjunction) > >>> (My J is very very rusty, and likely obsolete...) > >>> to do that that will invoke your kernel verb without generating all the > >>> infixes. > >>> > >>> By convolution here, I mean something like moving-window inner > >>> product, string search, or, err, convolution. > >>> > >>> What processing do you apply to each infix? > >>> > >>> Bob > >>> > >>> > >>> > >>> On 15-02-03 03:36 PM, Joe Bogner wrote: > >>>> > >>>> Thanks Bob. I agree that compiling often helps. I have previously > >>>> kicked around the idea of generating byte code or machine code and > >>>> interpreting it or executing it. I have called dlls in other > >>>> circumstances, but it would be nice to have everything in J. > >>>> > >>>> In this case, the challenge put a 2GB constraint on memory and 60 > >>>> seconds of execution time. It's an artificial constraint, which is why > >>>> I posted to chat. Those constraint wouldn't exist in the real world. > >>>> > >>>> I'm scanning a 240MB text file and generating 9 character infixes and > >>>> processing them. I hit the memory limit using 9]\ so I figured I'd try > >>>> looping and using memr (15!:1). In an earlier version, about 270 > >>>> seconds were spent in the looping construct and 40 seconds in memr. It > >>>> seemed like there should be a faster looping construct > >>>> > >>>> Here's a simple version: > >>>> > >>>> txt=: a. {~ (97&+(i. 26)) > >>>> addr=: mema (# txt) > >>>> txt memw addr,0,(#txt) > >>>> 6!:2 '([:<:([:0&{::(] ; smoutput&([:15!:1(addr,9,~<:@:])) ))) ^:] > >>>> (<:#txt)' > >>>> > >>>> > >>>> yz > >>>> xyz > >>>> wxyz > >>>> vwxyz > >>>> uvwxyz > >>>> tuvwxyz > >>>> stuvwxyz > >>>> rstuvwxyz > >>>> qrstuvwxy > >>>> pqrstuvwx > >>>> opqrstuvw > >>>> nopqrstuv > >>>> mnopqrstu > >>>> lmnopqrst > >>>> klmnopqrs > >>>> jklmnopqr > >>>> ijklmnopq > >>>> hijklmnop > >>>> ghijklmno > >>>> fghijklmn > >>>> efghijklm > >>>> defghijkl > >>>> cdefghijk > >>>> bcdefghij > >>>> abcdefghi > >>>> > >>>> > >>>> I would need more processing of the results, but you can see roughly > >>>> how it compares to 9]\ > >>>> > >>>> 9[\txt > >>>> abcdefghi > >>>> bcdefghij > >>>> cdefghijk > >>>> defghijkl > >>>> efghijklm > >>>> fghijklmn > >>>> ghijklmno > >>>> hijklmnop > >>>> ijklmnopq > >>>> jklmnopqr > >>>> klmnopqrs > >>>> lmnopqrst > >>>> mnopqrstu > >>>> nopqrstuv > >>>> opqrstuvw > >>>> pqrstuvwx > >>>> qrstuvwxy > >>>> rstuvwxyz > >>>> > >>>> > >>>> > >>>> > >>>> On Tue, Feb 3, 2015 at 3:05 PM, Robert Bernecky > >>>> <[email protected]> wrote: > >>>>> > >>>>> Compiling often helps. When I compile looping, scalar-dominated > >>>>> APL code, I generally get about a factor of 1000X speedup. > >>>>> You should get the same sort of speedup with compiled J. > >>>>> [No, I do not have a J compiler, but will be happy to create one > >>>>> if funding were to materialize.] > >>>>> > >>>>> However, you mentioned that you were "looping due to memory > >>>>> constraints...". Is it possible that a different algorithm would let > >>>>> you do less looping? E.g., looping over 1e4 elements at once, > >>>>> or using a different approach, such as dynamic programming? > >>>>> > >>>>> Bob > >>>>> > >>>>> > >>>>> On 15-02-03 02:53 PM, Joe Bogner wrote: > >>>>>> > >>>>>> I'm playing around with a coding challenge that I'm solving with a > >>>>>> loop that needs to execute over 200 million times. > >>>>>> > >>>>>> Setting aside whether that's a smart approach, what is the quickest > >>>>>> looping construct in J? I realize that J is not optimized for this > >>>>>> style of programming, but I'm still curious if it can be done > >>>>>> efficiently. > >>>>>> > >>>>>> The quickest I've found takes about 7 seconds to iterate 100 million > >>>>>> times, whereas the same loop in C takes 0.03 seconds. > >>>>>> > >>>>>> 6!:2 '(] [ ])^:] 100e6' > >>>>>> 7.00952 > >>>>>> > >>>>>> The train is taking up some time, but eventually I will need it > >>>>>> > >>>>>> No train - 2.7 seconds > >>>>>> > >>>>>> 6!:2 ']^:] 100e6' > >>>>>> 2.70656 > >>>>>> > >>>>>> > >>>>>> Explicit: > >>>>>> > >>>>>> loop=: 3 : 0 > >>>>>> ctr=:0 > >>>>>> while. 1 do. > >>>>>> ctr=:>:ctr > >>>>>> if. y < ctr do. > >>>>>> smoutput 'done' > >>>>>> return. > >>>>>> end. > >>>>>> end. > >>>>>> ) > >>>>>> > >>>>>> 6!:2 'loop 100e6' > >>>>>> 122.48 > >>>>>> > >>>>>> Clearly explicit isn't the way to go > >>>>>> > >>>>>> Back to the tacit: > >>>>>> > >>>>>> log=:smoutput bind ] > >>>>>> > >>>>>> ([: <: [: 0&{:: ] ; log )^:] 10 > >>>>>> 10 > >>>>>> 9 > >>>>>> 8 > >>>>>> 7 > >>>>>> 6 > >>>>>> 5 > >>>>>> 4 > >>>>>> 3 > >>>>>> 2 > >>>>>> 1 > >>>>>> 0 > >>>>>> > >>>>>> I would replace log with my function that uses the counter value > >>>>>> > >>>>>> That's pretty slow though on 100 million iterations - 88 seconds > >>>>>> > >>>>>> 6!:2 '([: <: [: 0&{:: ] ; ] )^:] 100e6' > >>>>>> 88.3644 > >>>>>> > >>>>>> Let's try a new log > >>>>>> > >>>>>> log=: ] <:@:[ (smoutput bind ]) > >>>>>> log 55 > >>>>>> 55 > >>>>>> 54 > >>>>>> > >>>>>> > >>>>>> That looks like it could work. Let's put a dummy in for now > >>>>>> > >>>>>> log=: ] <:@:[ ] > >>>>>> > >>>>>> 6!:2 'log^:] 100e6' > >>>>>> 40.9542 > >>>>>> > >>>>>> Still 40 seconds... Any ideas on how to speed up iteration like > this? > >>>>>> > >>>>>> > >>>>>> Sidenote: I'm looping due to memory constraints placed on the coding > >>>>>> challenge > >>>>>> > ---------------------------------------------------------------------- > >>>>>> For information about J forums see > http://www.jsoftware.com/forums.htm > >>>>>> > >>>>> -- > >>>>> Robert Bernecky > >>>>> Snake Island Research Inc > >>>>> 18 Fifth Street > >>>>> Ward's Island > >>>>> Toronto, Ontario M5J 2B9 > >>>>> > >>>>> [email protected] > >>>>> tel: +1 416 203 0854 > >>>>> > >>> > >>> -- > >>> Robert Bernecky > >>> Snake Island Research Inc > >>> 18 Fifth Street > >>> Ward's Island > >>> Toronto, Ontario M5J 2B9 > >>> > >>> [email protected] > >>> tel: +1 416 203 0854 > >>> > > > > > > -- > > Robert Bernecky > > Snake Island Research Inc > > 18 Fifth Street > > Ward's Island > > Toronto, Ontario M5J 2B9 > > > > [email protected] > > tel: +1 416 203 0854 > > > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > -- Devon McCormick, CFA ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
