I've been updating work on my adverb for working with large files -
http://www.jsoftware.com/jwiki/DevonMcCormick/Code/WorkOnLargeFiles - that
may provide a general way to apply an arbitrary verb across a large file.

On Tue, Feb 3, 2015 at 4:25 PM, Raul Miller <[email protected]> wrote:

> Another approach, depending on what the processing looks like, might
> involve selecting indices based on something smaller than nine
> characters. For example, if certain character pairs were "valid" or
> maybe even "better than others" you could get indices which select
> those pairs and further refine from there.
>
> But the blocked approach can also be a very good one. (Just make sure
> to handle infixes which span block boundaries.)
>
> Thanks,
>
> --
> Raul
>
> On Tue, Feb 3, 2015 at 3:54 PM, Robert Bernecky
> <[email protected]> wrote:
> > I agree that you need to have more than one cell on hand to do the nub!
> >
> > However,  try this approach. It's crude, but it might match or beat C:
> >
> >    - decide how much input you can process at once.
> >    - Result =. ''
> >    - while more hunks, loop over:
> >        -    hunkresult = nub-of-infix on current hunk of the input.
> >        -    Result = nub Result,hunkresult
> >
> > This lets J do most of the heavy lifting, while keeping you
> > within the memory constraints.
> >
> > Bob
> >
> > On 15-02-03 03:45 PM, Joe Bogner wrote:
> >>
> >> Sorry for the double email, I realized I replied to you directly. I'm
> >> replying to you directly since you replied to me directly, just in
> >> case you didn't want to send it to chat...
> >>
> >> In any case, the whole goal is to get the nub of the 9 character
> >> infixes. There is special code for some of the mathematical operations
> >> on infixes (such as sum) that don't generate all the infixes, however
> >> I having found anything for nub (distinct), which makes some sense
> >> since it can't be reduced to an atom like the math operations
> >>
> >> Thanks for the ideas
> >>
> >>
> >> On Tue, Feb 3, 2015 at 3:42 PM, Robert Bernecky
> >> <[email protected]> wrote:
> >>>
> >>> "9 character infixes" - that looks like an input to a convolution, does
> >>> it
> >>> not?
> >>> If so, then there should be some J dohickey (adverb or conjunction)
> >>>   (My J is very very rusty, and likely obsolete...)
> >>> to do that that will invoke your kernel verb without generating all the
> >>> infixes.
> >>>
> >>> By convolution here, I mean something like moving-window inner
> >>> product, string search, or, err, convolution.
> >>>
> >>> What processing do you apply to each infix?
> >>>
> >>> Bob
> >>>
> >>>
> >>>
> >>> On 15-02-03 03:36 PM, Joe Bogner wrote:
> >>>>
> >>>> Thanks Bob. I agree that compiling often helps. I have previously
> >>>> kicked around the idea of generating byte code or machine code and
> >>>> interpreting it or executing it. I have called dlls in other
> >>>> circumstances, but it would be nice to have everything in J.
> >>>>
> >>>> In this case, the challenge put a 2GB constraint on memory and 60
> >>>> seconds of execution time. It's an artificial constraint, which is why
> >>>> I posted to chat. Those constraint wouldn't exist in the real world.
> >>>>
> >>>> I'm scanning a 240MB text file and generating 9 character infixes and
> >>>> processing them. I hit the memory limit using 9]\ so I figured I'd try
> >>>> looping and using memr (15!:1). In an earlier version, about 270
> >>>> seconds were spent in the looping construct and 40 seconds in memr. It
> >>>> seemed like there should be a faster looping construct
> >>>>
> >>>> Here's a simple version:
> >>>>
> >>>> txt=: a. {~ (97&+(i. 26))
> >>>> addr=: mema (# txt)
> >>>> txt memw addr,0,(#txt)
> >>>> 6!:2 '([:<:([:0&{::(] ; smoutput&([:15!:1(addr,9,~<:@:]))    ))) ^:]
> >>>> (<:#txt)'
> >>>>
> >>>>
> >>>>      yz
> >>>> xyz
> >>>> wxyz
> >>>> vwxyz
> >>>> uvwxyz
> >>>> tuvwxyz
> >>>> stuvwxyz
> >>>> rstuvwxyz
> >>>> qrstuvwxy
> >>>> pqrstuvwx
> >>>> opqrstuvw
> >>>> nopqrstuv
> >>>> mnopqrstu
> >>>> lmnopqrst
> >>>> klmnopqrs
> >>>> jklmnopqr
> >>>> ijklmnopq
> >>>> hijklmnop
> >>>> ghijklmno
> >>>> fghijklmn
> >>>> efghijklm
> >>>> defghijkl
> >>>> cdefghijk
> >>>> bcdefghij
> >>>> abcdefghi
> >>>>
> >>>>
> >>>> I would need more processing of the results, but you can see roughly
> >>>> how it compares to 9]\
> >>>>
> >>>> 9[\txt
> >>>> abcdefghi
> >>>> bcdefghij
> >>>> cdefghijk
> >>>> defghijkl
> >>>> efghijklm
> >>>> fghijklmn
> >>>> ghijklmno
> >>>> hijklmnop
> >>>> ijklmnopq
> >>>> jklmnopqr
> >>>> klmnopqrs
> >>>> lmnopqrst
> >>>> mnopqrstu
> >>>> nopqrstuv
> >>>> opqrstuvw
> >>>> pqrstuvwx
> >>>> qrstuvwxy
> >>>> rstuvwxyz
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Feb 3, 2015 at 3:05 PM, Robert Bernecky
> >>>> <[email protected]> wrote:
> >>>>>
> >>>>> Compiling often helps. When I compile looping, scalar-dominated
> >>>>> APL code, I generally get about a factor of 1000X speedup.
> >>>>> You should get the same sort of speedup with compiled J.
> >>>>> [No, I do not have a J compiler, but will be happy to create one
> >>>>> if funding were to materialize.]
> >>>>>
> >>>>> However, you mentioned that you were "looping due to memory
> >>>>> constraints...". Is it possible that a different algorithm would let
> >>>>> you do less looping? E.g., looping over 1e4 elements at once,
> >>>>> or using a different approach, such as dynamic programming?
> >>>>>
> >>>>> Bob
> >>>>>
> >>>>>
> >>>>> On 15-02-03 02:53 PM, Joe Bogner wrote:
> >>>>>>
> >>>>>> I'm playing around with a coding challenge that I'm solving with a
> >>>>>> loop that needs to execute over 200 million times.
> >>>>>>
> >>>>>> Setting aside whether that's a smart approach, what is the quickest
> >>>>>> looping construct in J? I realize that J is not optimized for this
> >>>>>> style of programming, but I'm still curious if it can be done
> >>>>>> efficiently.
> >>>>>>
> >>>>>> The quickest I've found takes about 7 seconds to iterate 100 million
> >>>>>> times, whereas the same loop in C takes 0.03 seconds.
> >>>>>>
> >>>>>> 6!:2 '(] [ ])^:] 100e6'
> >>>>>> 7.00952
> >>>>>>
> >>>>>> The train is taking up some time, but eventually I will need it
> >>>>>>
> >>>>>> No train - 2.7 seconds
> >>>>>>
> >>>>>> 6!:2 ']^:] 100e6'
> >>>>>> 2.70656
> >>>>>>
> >>>>>>
> >>>>>> Explicit:
> >>>>>>
> >>>>>> loop=: 3 : 0
> >>>>>> ctr=:0
> >>>>>> while. 1 do.
> >>>>>> ctr=:>:ctr
> >>>>>> if. y < ctr do.
> >>>>>> smoutput 'done'
> >>>>>> return.
> >>>>>> end.
> >>>>>> end.
> >>>>>> )
> >>>>>>
> >>>>>> 6!:2 'loop 100e6'
> >>>>>> 122.48
> >>>>>>
> >>>>>> Clearly explicit isn't the way to go
> >>>>>>
> >>>>>> Back to the tacit:
> >>>>>>
> >>>>>> log=:smoutput bind ]
> >>>>>>
> >>>>>>       ([: <: [: 0&{:: ] ; log )^:] 10
> >>>>>> 10
> >>>>>> 9
> >>>>>> 8
> >>>>>> 7
> >>>>>> 6
> >>>>>> 5
> >>>>>> 4
> >>>>>> 3
> >>>>>> 2
> >>>>>> 1
> >>>>>> 0
> >>>>>>
> >>>>>> I would replace log with my function that uses the counter value
> >>>>>>
> >>>>>> That's pretty slow though on 100 million iterations - 88 seconds
> >>>>>>
> >>>>>> 6!:2 '([: <: [: 0&{:: ] ; ] )^:] 100e6'
> >>>>>> 88.3644
> >>>>>>
> >>>>>> Let's try a new log
> >>>>>>
> >>>>>>      log=: ] <:@:[ (smoutput bind ])
> >>>>>>       log 55
> >>>>>> 55
> >>>>>> 54
> >>>>>>
> >>>>>>
> >>>>>> That looks like it could work. Let's put a dummy in for now
> >>>>>>
> >>>>>>      log=: ] <:@:[ ]
> >>>>>>
> >>>>>> 6!:2 'log^:] 100e6'
> >>>>>> 40.9542
> >>>>>>
> >>>>>> Still 40 seconds... Any ideas on how to speed up iteration like
> this?
> >>>>>>
> >>>>>>
> >>>>>> Sidenote: I'm looping due to memory constraints placed on the coding
> >>>>>> challenge
> >>>>>>
> ----------------------------------------------------------------------
> >>>>>> For information about J forums see
> http://www.jsoftware.com/forums.htm
> >>>>>>
> >>>>> --
> >>>>> Robert Bernecky
> >>>>> Snake Island Research Inc
> >>>>> 18 Fifth Street
> >>>>> Ward's Island
> >>>>> Toronto, Ontario M5J 2B9
> >>>>>
> >>>>> [email protected]
> >>>>> tel: +1 416 203 0854
> >>>>>
> >>>
> >>> --
> >>> Robert Bernecky
> >>> Snake Island Research Inc
> >>> 18 Fifth Street
> >>> Ward's Island
> >>> Toronto, Ontario M5J 2B9
> >>>
> >>> [email protected]
> >>> tel: +1 416 203 0854
> >>>
> >
> >
> > --
> > Robert Bernecky
> > Snake Island Research Inc
> > 18 Fifth Street
> > Ward's Island
> > Toronto, Ontario M5J 2B9
> >
> > [email protected]
> > tel: +1 416 203 0854
> >
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>



-- 
Devon McCormick, CFA
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to