Another approach, depending on what the processing looks like, might
involve selecting indices based on something smaller than nine
characters. For example, if certain character pairs were "valid" or
maybe even "better than others" you could get indices which select
those pairs and further refine from there.

But the blocked approach can also be a very good one. (Just make sure
to handle infixes which span block boundaries.)

Thanks,

-- 
Raul

On Tue, Feb 3, 2015 at 3:54 PM, Robert Bernecky
<[email protected]> wrote:
> I agree that you need to have more than one cell on hand to do the nub!
>
> However,  try this approach. It's crude, but it might match or beat C:
>
>    - decide how much input you can process at once.
>    - Result =. ''
>    - while more hunks, loop over:
>        -    hunkresult = nub-of-infix on current hunk of the input.
>        -    Result = nub Result,hunkresult
>
> This lets J do most of the heavy lifting, while keeping you
> within the memory constraints.
>
> Bob
>
> On 15-02-03 03:45 PM, Joe Bogner wrote:
>>
>> Sorry for the double email, I realized I replied to you directly. I'm
>> replying to you directly since you replied to me directly, just in
>> case you didn't want to send it to chat...
>>
>> In any case, the whole goal is to get the nub of the 9 character
>> infixes. There is special code for some of the mathematical operations
>> on infixes (such as sum) that don't generate all the infixes, however
>> I having found anything for nub (distinct), which makes some sense
>> since it can't be reduced to an atom like the math operations
>>
>> Thanks for the ideas
>>
>>
>> On Tue, Feb 3, 2015 at 3:42 PM, Robert Bernecky
>> <[email protected]> wrote:
>>>
>>> "9 character infixes" - that looks like an input to a convolution, does
>>> it
>>> not?
>>> If so, then there should be some J dohickey (adverb or conjunction)
>>>   (My J is very very rusty, and likely obsolete...)
>>> to do that that will invoke your kernel verb without generating all the
>>> infixes.
>>>
>>> By convolution here, I mean something like moving-window inner
>>> product, string search, or, err, convolution.
>>>
>>> What processing do you apply to each infix?
>>>
>>> Bob
>>>
>>>
>>>
>>> On 15-02-03 03:36 PM, Joe Bogner wrote:
>>>>
>>>> Thanks Bob. I agree that compiling often helps. I have previously
>>>> kicked around the idea of generating byte code or machine code and
>>>> interpreting it or executing it. I have called dlls in other
>>>> circumstances, but it would be nice to have everything in J.
>>>>
>>>> In this case, the challenge put a 2GB constraint on memory and 60
>>>> seconds of execution time. It's an artificial constraint, which is why
>>>> I posted to chat. Those constraint wouldn't exist in the real world.
>>>>
>>>> I'm scanning a 240MB text file and generating 9 character infixes and
>>>> processing them. I hit the memory limit using 9]\ so I figured I'd try
>>>> looping and using memr (15!:1). In an earlier version, about 270
>>>> seconds were spent in the looping construct and 40 seconds in memr. It
>>>> seemed like there should be a faster looping construct
>>>>
>>>> Here's a simple version:
>>>>
>>>> txt=: a. {~ (97&+(i. 26))
>>>> addr=: mema (# txt)
>>>> txt memw addr,0,(#txt)
>>>> 6!:2 '([:<:([:0&{::(] ; smoutput&([:15!:1(addr,9,~<:@:]))    ))) ^:]
>>>> (<:#txt)'
>>>>
>>>>
>>>>      yz
>>>> xyz
>>>> wxyz
>>>> vwxyz
>>>> uvwxyz
>>>> tuvwxyz
>>>> stuvwxyz
>>>> rstuvwxyz
>>>> qrstuvwxy
>>>> pqrstuvwx
>>>> opqrstuvw
>>>> nopqrstuv
>>>> mnopqrstu
>>>> lmnopqrst
>>>> klmnopqrs
>>>> jklmnopqr
>>>> ijklmnopq
>>>> hijklmnop
>>>> ghijklmno
>>>> fghijklmn
>>>> efghijklm
>>>> defghijkl
>>>> cdefghijk
>>>> bcdefghij
>>>> abcdefghi
>>>>
>>>>
>>>> I would need more processing of the results, but you can see roughly
>>>> how it compares to 9]\
>>>>
>>>> 9[\txt
>>>> abcdefghi
>>>> bcdefghij
>>>> cdefghijk
>>>> defghijkl
>>>> efghijklm
>>>> fghijklmn
>>>> ghijklmno
>>>> hijklmnop
>>>> ijklmnopq
>>>> jklmnopqr
>>>> klmnopqrs
>>>> lmnopqrst
>>>> mnopqrstu
>>>> nopqrstuv
>>>> opqrstuvw
>>>> pqrstuvwx
>>>> qrstuvwxy
>>>> rstuvwxyz
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Feb 3, 2015 at 3:05 PM, Robert Bernecky
>>>> <[email protected]> wrote:
>>>>>
>>>>> Compiling often helps. When I compile looping, scalar-dominated
>>>>> APL code, I generally get about a factor of 1000X speedup.
>>>>> You should get the same sort of speedup with compiled J.
>>>>> [No, I do not have a J compiler, but will be happy to create one
>>>>> if funding were to materialize.]
>>>>>
>>>>> However, you mentioned that you were "looping due to memory
>>>>> constraints...". Is it possible that a different algorithm would let
>>>>> you do less looping? E.g., looping over 1e4 elements at once,
>>>>> or using a different approach, such as dynamic programming?
>>>>>
>>>>> Bob
>>>>>
>>>>>
>>>>> On 15-02-03 02:53 PM, Joe Bogner wrote:
>>>>>>
>>>>>> I'm playing around with a coding challenge that I'm solving with a
>>>>>> loop that needs to execute over 200 million times.
>>>>>>
>>>>>> Setting aside whether that's a smart approach, what is the quickest
>>>>>> looping construct in J? I realize that J is not optimized for this
>>>>>> style of programming, but I'm still curious if it can be done
>>>>>> efficiently.
>>>>>>
>>>>>> The quickest I've found takes about 7 seconds to iterate 100 million
>>>>>> times, whereas the same loop in C takes 0.03 seconds.
>>>>>>
>>>>>> 6!:2 '(] [ ])^:] 100e6'
>>>>>> 7.00952
>>>>>>
>>>>>> The train is taking up some time, but eventually I will need it
>>>>>>
>>>>>> No train - 2.7 seconds
>>>>>>
>>>>>> 6!:2 ']^:] 100e6'
>>>>>> 2.70656
>>>>>>
>>>>>>
>>>>>> Explicit:
>>>>>>
>>>>>> loop=: 3 : 0
>>>>>> ctr=:0
>>>>>> while. 1 do.
>>>>>> ctr=:>:ctr
>>>>>> if. y < ctr do.
>>>>>> smoutput 'done'
>>>>>> return.
>>>>>> end.
>>>>>> end.
>>>>>> )
>>>>>>
>>>>>> 6!:2 'loop 100e6'
>>>>>> 122.48
>>>>>>
>>>>>> Clearly explicit isn't the way to go
>>>>>>
>>>>>> Back to the tacit:
>>>>>>
>>>>>> log=:smoutput bind ]
>>>>>>
>>>>>>       ([: <: [: 0&{:: ] ; log )^:] 10
>>>>>> 10
>>>>>> 9
>>>>>> 8
>>>>>> 7
>>>>>> 6
>>>>>> 5
>>>>>> 4
>>>>>> 3
>>>>>> 2
>>>>>> 1
>>>>>> 0
>>>>>>
>>>>>> I would replace log with my function that uses the counter value
>>>>>>
>>>>>> That's pretty slow though on 100 million iterations - 88 seconds
>>>>>>
>>>>>> 6!:2 '([: <: [: 0&{:: ] ; ] )^:] 100e6'
>>>>>> 88.3644
>>>>>>
>>>>>> Let's try a new log
>>>>>>
>>>>>>      log=: ] <:@:[ (smoutput bind ])
>>>>>>       log 55
>>>>>> 55
>>>>>> 54
>>>>>>
>>>>>>
>>>>>> That looks like it could work. Let's put a dummy in for now
>>>>>>
>>>>>>      log=: ] <:@:[ ]
>>>>>>
>>>>>> 6!:2 'log^:] 100e6'
>>>>>> 40.9542
>>>>>>
>>>>>> Still 40 seconds... Any ideas on how to speed up iteration like this?
>>>>>>
>>>>>>
>>>>>> Sidenote: I'm looping due to memory constraints placed on the coding
>>>>>> challenge
>>>>>> ----------------------------------------------------------------------
>>>>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>>>>
>>>>> --
>>>>> Robert Bernecky
>>>>> Snake Island Research Inc
>>>>> 18 Fifth Street
>>>>> Ward's Island
>>>>> Toronto, Ontario M5J 2B9
>>>>>
>>>>> [email protected]
>>>>> tel: +1 416 203 0854
>>>>>
>>>
>>> --
>>> Robert Bernecky
>>> Snake Island Research Inc
>>> 18 Fifth Street
>>> Ward's Island
>>> Toronto, Ontario M5J 2B9
>>>
>>> [email protected]
>>> tel: +1 416 203 0854
>>>
>
>
> --
> Robert Bernecky
> Snake Island Research Inc
> 18 Fifth Street
> Ward's Island
> Toronto, Ontario M5J 2B9
>
> [email protected]
> tel: +1 416 203 0854
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to