Re: [Perldl] matching vectors inside a PDL

Chris Marshall Fri, 21 Nov 2014 05:05:42 -0800

Ken-

If you could make a short script that generates the
problem along with the output/error messages, that
would help.


Do you have $PDL::BIGPDL set?  Might try with
that set to 1.

I'll try the problem code on PDL-2.007 to see if that
is the reason for the differences.

--Chris



On Thu, Nov 20, 2014 at 6:18 PM, LYONS, KENNETH B (KENNETH)
<k...@research.att.com> wrote:
> Chris
> I'm running perl 5.8.8 on a rather old linux system.  I installed the perl 
> modules rather recently from the PDL site, so I'd expect they are up to date 
> with whatever is there.  From the names of the files, I'd say it's 2.007.
>
> I've tried a variety of ways of using the inplace method, and none of them 
> produced a perl error akin to what you got below.  The errors were coming out 
> of the PDL module itself, complaining about the size of the piddle being over 
> 1GB.  Given the dimensions of the piddle that is being calculated (around 200 
> MB), that shouldn't have happened--unless it's using doubles, which would 
> make it ~1.6 GB.  Like I said, I got around the problem in kind of a hack, by 
> just slicing things up 20K rows at a time--but I'd really like to find a way 
> to do it right!
>
> Among the things I tried were these:
>   $sigs->xchg(0,1) *= $present;
>   $sigs->xchg(0,1)->inplace->mult($present,0);
>   PDL::Ops::mult(inplace $sigs->xchg(0,1), $present, 0);
>   $sigs->xchg(0,1)->inplace *= present;
> None of which got around the error.
>
> Below is what finally worked (but only by occupying more memory than it 
> should):
>
> ($psize) = $present->dims;
> $STEPSIZE = 20000;
> for ($p = 0; $p < $psize; $p += $STEPSIZE) {  # note: it's known that 
> $present and $sigs have the same size!
>   my $start = $p;
>   my $end = $start+$STEPSIZE-1;
>   $end = $psize-1 if $end >= $psize;
>   $sigs->xchg(0,1)->slice("$start:$end,:,:") *= 
> $present->slice("$start:$end");
> }
>
> Like I said, it's a bit of a hack.  But it does wind up doing the appropriate 
> filtering on the $sigs matrix.
>
> Ken
>
> p.s. I don't know if it makes a difference, per se, but you are evidently 
> operating in an interactive environment, not an actual perl script.  I'm 
> using this to automate thru a very large body of data, eventually be run 
> automatically on a daily basis, so it's written as a script that calls the 
> PDL modules.  The error I refer to above was appearing in the error output of 
> the perl command.
>
> KL
>
> ----------------------------------------------------------------------------
>
> Below is the remainder of the thread that was mostly sidebar:
> -------------------------------------
> Hi Ken-
>
> You could sync up with the message I forwarded
> to perldl by replying with this message to that
> thread.  The main reason for keeping the discussion
> on the list is so that others can benefit from the
> discussion and/or offer other points of view/facts/...
>
> I tried the following in pdl2 and was not able to generate
> an error.  You are right that all byte args shouldn't be
> expanded to double intermediates.  I'm using PDL-2.007_03
> on cygwin64/win7 and the *= works fine but I get an error
> with the inplace construct (not the same as yours)
>
>   pdl> $sigs = (10*random(40,150000,26))->floor->byte
>   pdl> $present = (20*random(150000))->floor->byte
>   pdl> $ns = $sigs->copy
>   pdl> ?vars
>   PDL variables in package main::
>
>   Name         Type   Dimension       Flow  State          Mem
>   ----------------------------------------------------------------
>   $ns            Byte D [40,150000,26]        P          148.77MB
>   $present       Byte D [150000]             P            0.14MB
>   $sigs          Byte D [40,150000,26]        P          148.77MB
>   pdl> $sigs->xchg(0,1) *= $present  # works
>   pdl> $sigs = $ns->copy
>   pdl> $sigs->xchg(0,1)->inplace *= $present
>   Runtime error: Can't modify non-lvalue subroutine call at (eval 484) line 5.
>
> What is your os/platform specs and what version of PDL are
> you using?
>
> --Chris
>
> On Thu, Nov 20, 2014 at 2:47 PM, LYONS, KENNETH B (KENNETH)
> <k...@research.att.com> wrote:
>> (Didn't understand your first line, as there was no cc on this message?  I 
>> pretty much automatically avoid ever using reply-all, but I guess in this 
>> case that's how it's supposed to work, right?  How do I cc it to get the 
>> thread to match up?)
>>
>> Actually, all the pdls involved are byte type.  I was assuming when I saw 
>> the errors occurring that it was somehow generating a double intermediate, 
>> because it should have had plenty of room if it stayed as byte.
>>
>> The specific code was as follows:
>>
>> # sigs is byte, with dimensions about 40 x 150000 x 26
>> # present is byte, with dimension of 150000
>> $sigs->xchg(0,1)->inplace *= $present;
>>
>> I had tried numerous ways of using inplace in that line, and none of them 
>> avoided the complaint that it had run out of memory (although the memory 
>> usage prior to that command was about 10%).  So if it's not generating a 
>> double intermediate, I don't see why it would run out of memory (it 
>> shouldn't have exceeded about 20% or so).  I finally got it to work by 
>> splitting the structures up into slices of about 20K rows each, and doing 
>> the calculation that way.
>>
>> Other approaches?
>>
>> Ken
>>
>>
>>
>>
>> -----Original Message-----
>> From: Chris Marshall [mailto:devel.chm...@gmail.com]
>> Sent: Wednesday, November 19, 2014 4:23 PM
>> To: LYONS, KENNETH B (KENNETH)
>> Subject: Re: [Perldl] matching vectors inside a PDL
>>
>> <re-cc-ing the perldl list>
>>
>> Thanks for the background.  If you hit a snag, feel free to
>> post to the perldl list.  We're usually able to help for
>> specific problems especially if accompanied by code
>> demonstrating the problem:
>>
>>   If I have a big byte piddle $a and I multiply it in-place
>>   my PDL session crashes because of a huge intermediate
>>   temp:
>>
>>   pdl> $a = (10*random(100))->floor->byte;
>>
>>   pdl> $a->inplace->mult(5,0);
>>   Error message here or crash
>>
>> Without a specific example, I would guess that the problem
>> is the piddle you are multiplying by (or perl scalar) is of
>> type double which would result in an intermediate temp of
>> double type which would then collapse down to a byte piddle
>> again at the end.  If both arguments to multiply are of byte
>> type, you can avoid the big double intermediate temp.  E.g.
>>
>>   pdl> p pdl(5)->type
>>   double
>>   pdl> p byte(5)->type
>>   byte
>>
>> Improved type support is planned for the PDL3 work.  My
>> initial ideas for bitfield support can be seen here:
>>
>>   
>> http://mailman.jach.hawaii.edu/pipermail/pdl-porters/2013-December/006132.html
>>
>> Hope this helps,
>> Chris
>>
>> On Wed, Nov 19, 2014 at 1:42 PM, LYONS, KENNETH B (KENNETH)
>> <k...@research.att.com> wrote:
>>> Chris
>>>
>>> In answer to your question: my path in was as follows:  I wanted to find a 
>>> way to implement an LP on a medium-size problem (<~10K variables), and the 
>>> rest of my code was in perl--so I went looking for an LP implementation in 
>>> perl.   I was expecting to find a C-compiled module that would do an LP 
>>> specifically.  I found some instances of that sort of thing, but I also ran 
>>> across one using PDL.  It didn't do quite what I wanted, but when I saw the 
>>> PDL site, it was obvious this was something I needed to know about.  I 
>>> wound up writing my own simplex implementation in PDL to do specifically 
>>> what I needed, and that worked great--and I was pretty blown away at the 
>>> speed.  So then I started looking into how I could back up and get the 
>>> datasets I was dealing with implemented as PDLs to start with.   So I've 
>>> got a good bit of code now using PDL, not just the little simplex program 
>>> (which was only a few dozen lines--that was pretty easy to implement in 
>>> PDL.)
>>>
>>> I continue to have issues with the documentation, though.  Just as one 
>>> example from today: the mult function seems to claim that you can get it to 
>>> operate in-place.  And for me that was important, because I'm dealing with 
>>> a large dataset (of byte variables).  But, not only does "mult" by itself 
>>> cause an error because it isn't exported, but when I try to use it as 
>>> PDL::Ops::Mult(inplace ...), or PDL::Mult(inplace ...), or as 
>>> $piddle->inplace->mult(...), it completely fails to avoid generating a 
>>> large intermediate.  That was clobbering my program, repetitively, so I 
>>> finally punted and decided to break that step up into segments with only a 
>>> few hundred thousand elements to multiply in each (using slices), and that 
>>> got me around the problem.  But there was nothing in the documentation that 
>>> seemed to suggest that would be necessary.  It also seemed, although I 
>>> didn't document this carefully, that changing the default PDL type didn't 
>>> have any impact on the size of that temporary intermediate (I think it was 
>>> using double no matter what I did--whereas using byte would have been fine.)
>>>
>>> I'd love it, in this context, if there were a PDL type of "bit" by the way, 
>>> since that's actually what this problem is using--it's a 3D binary matrix, 
>>> of ones and zeroes, with up to ~3*10^^9 elements.  When the number of 
>>> elements goes above ~200M, when I'm using bytes, I have to do things to 
>>> break it up and process one segment at a time, and it would be nice if that 
>>> weren't necessary--but there is evidently no implementation of a "bit" type 
>>> in PDL.
>>>
>>> Ken
>>>
>>>
>
> -----Original Message-----
> From: Chris Marshall [mailto:devel.chm...@gmail.com]
> Sent: Saturday, November 15, 2014 11:42 AM
> To: Derek Lamb
> Cc: LYONS, KENNETH B (KENNETH); perldl
> Subject: Re: [Perldl] matching vectors inside a PDL
>
> Hi Ken and welcome to the PDL community!
>
>> On Nov 14, 2014, at 1:33 PM, LYONS, KENNETH B (KENNETH)
>> <k...@research.att.com> wrote:
>>
>> Yes, most of this I knew, but thanks.  It’s because of that behavior of >
>> and <, that you mentioned, that I thought that ‘==’ would compare element by
>> element instead of on the whole vector.
>>
>> Have you ever tried, for example, to search the documentation for, say, the
>> function “list”?  it gives you every occurrence of the word “list” in the
>> documents (which, needless to say, is rather voluminous, and the first few
>> hundred entries have nothing to do with the function!)  there should be some
>> analog of the “man” command in unix that gives you information about the
>> *function* without all the other garbage.  I think it’s just doing something
>> akin to a grep thru the documents.
>
> In addition to the help/? and apropos/?? in the PDL shells
> (pdl2 and perldl) there is the command line version pdldoc
> which can be used starting with 'pdldoc pdldoc' to get the usage.
>
> When I do 'pdldoc -a list' I get 45 lines of output all
> of whose descriptions seem relevant to a general search
> for something having to do with 'list' including the 'list'
> command itself.  This type of problem is not specific to
> PDL as searching the docs for any complicated system
> or program does tend to produce a large number of
> incomprehensible and not particularly useful results.
>
> It is definitely desired to have smarter and more useful
> documentation searches.  The ability to add keywords
> would be nice and is on the feature request list, I believe.
>
> If you are on unix-ish system, you might try something
> like 'pdldoc -a list | grep --color list' to make the output
> more visually comprehensible.  (We should probably add
> that to the PDL shell output)
>
>> It’s horribly designed in that regard.  The software itself is great, and
>> I’m very happy with the results, but finding the simplest little thing in
>> the docs can be a total pain!
>
> I understand frustration.  What would really help PDL
> development would be to know how you got to using
> PDL without being introduced to the concepts that would
> have made the learning curve less steep.  (At least you
> found the mailing list and used it. :-)
>
> Happy PDL-ing!
> Chris

_______________________________________________
Perldl mailing list
Perldl@jach.hawaii.edu
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Re: [Perldl] matching vectors inside a PDL

Reply via email to