Re: [Perldl] matching vectors inside a PDL

Chris Marshall Fri, 21 Nov 2014 06:37:18 -0800

Hi Ken-

I am unable to generate the error with PDL-2.007
either.  My system has 8GiB of memory and the
PDL build is using the 64bit index support.


What are the specs of your linux box and could
you please send the output of the 'perldl -V'
command.  If you built PDL from sources, then
the build log should indicate whether 64bit index
support was enabled.  If you $Config{ivsize} is
8 then you should have 64bit support as well.

--Chris


On Fri, Nov 21, 2014 at 8:02 AM, Chris Marshall <devel.chm...@gmail.com> wrote:
> Ken-
>
> If you could make a short script that generates the
> problem along with the output/error messages, that
> would help.
>
> Do you have $PDL::BIGPDL set?  Might try with
> that set to 1.
>
> I'll try the problem code on PDL-2.007 to see if that
> is the reason for the differences.
>
> --Chris
>
>
>
> On Thu, Nov 20, 2014 at 6:18 PM, LYONS, KENNETH B (KENNETH)
> <k...@research.att.com> wrote:
>> Chris
>> I'm running perl 5.8.8 on a rather old linux system.  I installed the perl 
>> modules rather recently from the PDL site, so I'd expect they are up to date 
>> with whatever is there.  From the names of the files, I'd say it's 2.007.
>>
>> I've tried a variety of ways of using the inplace method, and none of them 
>> produced a perl error akin to what you got below.  The errors were coming 
>> out of the PDL module itself, complaining about the size of the piddle being 
>> over 1GB.  Given the dimensions of the piddle that is being calculated 
>> (around 200 MB), that shouldn't have happened--unless it's using doubles, 
>> which would make it ~1.6 GB.  Like I said, I got around the problem in kind 
>> of a hack, by just slicing things up 20K rows at a time--but I'd really like 
>> to find a way to do it right!
>>
>> Among the things I tried were these:
>>   $sigs->xchg(0,1) *= $present;
>>   $sigs->xchg(0,1)->inplace->mult($present,0);
>>   PDL::Ops::mult(inplace $sigs->xchg(0,1), $present, 0);
>>   $sigs->xchg(0,1)->inplace *= present;
>> None of which got around the error.
>>
>> Below is what finally worked (but only by occupying more memory than it 
>> should):
>>
>> ($psize) = $present->dims;
>> $STEPSIZE = 20000;
>> for ($p = 0; $p < $psize; $p += $STEPSIZE) {  # note: it's known that 
>> $present and $sigs have the same size!
>>   my $start = $p;
>>   my $end = $start+$STEPSIZE-1;
>>   $end = $psize-1 if $end >= $psize;
>>   $sigs->xchg(0,1)->slice("$start:$end,:,:") *= 
>> $present->slice("$start:$end");
>> }
>>
>> Like I said, it's a bit of a hack.  But it does wind up doing the 
>> appropriate filtering on the $sigs matrix.
>>
>> Ken
>>
>> p.s. I don't know if it makes a difference, per se, but you are evidently 
>> operating in an interactive environment, not an actual perl script.  I'm 
>> using this to automate thru a very large body of data, eventually be run 
>> automatically on a daily basis, so it's written as a script that calls the 
>> PDL modules.  The error I refer to above was appearing in the error output 
>> of the perl command.
>>
>> KL
>>
>> ----------------------------------------------------------------------------
>>
>> Below is the remainder of the thread that was mostly sidebar:
>> -------------------------------------
>> Hi Ken-
>>
>> You could sync up with the message I forwarded
>> to perldl by replying with this message to that
>> thread.  The main reason for keeping the discussion
>> on the list is so that others can benefit from the
>> discussion and/or offer other points of view/facts/...
>>
>> I tried the following in pdl2 and was not able to generate
>> an error.  You are right that all byte args shouldn't be
>> expanded to double intermediates.  I'm using PDL-2.007_03
>> on cygwin64/win7 and the *= works fine but I get an error
>> with the inplace construct (not the same as yours)
>>
>>   pdl> $sigs = (10*random(40,150000,26))->floor->byte
>>   pdl> $present = (20*random(150000))->floor->byte
>>   pdl> $ns = $sigs->copy
>>   pdl> ?vars
>>   PDL variables in package main::
>>
>>   Name         Type   Dimension       Flow  State          Mem
>>   ----------------------------------------------------------------
>>   $ns            Byte D [40,150000,26]        P          148.77MB
>>   $present       Byte D [150000]             P            0.14MB
>>   $sigs          Byte D [40,150000,26]        P          148.77MB
>>   pdl> $sigs->xchg(0,1) *= $present  # works
>>   pdl> $sigs = $ns->copy
>>   pdl> $sigs->xchg(0,1)->inplace *= $present
>>   Runtime error: Can't modify non-lvalue subroutine call at (eval 484) line 
>> 5.
>>
>> What is your os/platform specs and what version of PDL are
>> you using?
>>
>> --Chris
>>
>> On Thu, Nov 20, 2014 at 2:47 PM, LYONS, KENNETH B (KENNETH)
>> <k...@research.att.com> wrote:
>>> (Didn't understand your first line, as there was no cc on this message?  I 
>>> pretty much automatically avoid ever using reply-all, but I guess in this 
>>> case that's how it's supposed to work, right?  How do I cc it to get the 
>>> thread to match up?)
>>>
>>> Actually, all the pdls involved are byte type.  I was assuming when I saw 
>>> the errors occurring that it was somehow generating a double intermediate, 
>>> because it should have had plenty of room if it stayed as byte.
>>>
>>> The specific code was as follows:
>>>
>>> # sigs is byte, with dimensions about 40 x 150000 x 26
>>> # present is byte, with dimension of 150000
>>> $sigs->xchg(0,1)->inplace *= $present;
>>>
>>> I had tried numerous ways of using inplace in that line, and none of them 
>>> avoided the complaint that it had run out of memory (although the memory 
>>> usage prior to that command was about 10%).  So if it's not generating a 
>>> double intermediate, I don't see why it would run out of memory (it 
>>> shouldn't have exceeded about 20% or so).  I finally got it to work by 
>>> splitting the structures up into slices of about 20K rows each, and doing 
>>> the calculation that way.
>>>
>>> Other approaches?
>>>
>>> Ken
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Chris Marshall [mailto:devel.chm...@gmail.com]
>>> Sent: Wednesday, November 19, 2014 4:23 PM
>>> To: LYONS, KENNETH B (KENNETH)
>>> Subject: Re: [Perldl] matching vectors inside a PDL
>>>
>>> <re-cc-ing the perldl list>
>>>
>>> Thanks for the background.  If you hit a snag, feel free to
>>> post to the perldl list.  We're usually able to help for
>>> specific problems especially if accompanied by code
>>> demonstrating the problem:
>>>
>>>   If I have a big byte piddle $a and I multiply it in-place
>>>   my PDL session crashes because of a huge intermediate
>>>   temp:
>>>
>>>   pdl> $a = (10*random(100))->floor->byte;
>>>
>>>   pdl> $a->inplace->mult(5,0);
>>>   Error message here or crash
>>>
>>> Without a specific example, I would guess that the problem
>>> is the piddle you are multiplying by (or perl scalar) is of
>>> type double which would result in an intermediate temp of
>>> double type which would then collapse down to a byte piddle
>>> again at the end.  If both arguments to multiply are of byte
>>> type, you can avoid the big double intermediate temp.  E.g.
>>>
>>>   pdl> p pdl(5)->type
>>>   double
>>>   pdl> p byte(5)->type
>>>   byte
>>>
>>> Improved type support is planned for the PDL3 work.  My
>>> initial ideas for bitfield support can be seen here:
>>>
>>>   
>>> http://mailman.jach.hawaii.edu/pipermail/pdl-porters/2013-December/006132.html
>>>
>>> Hope this helps,
>>> Chris
>>>
>>> On Wed, Nov 19, 2014 at 1:42 PM, LYONS, KENNETH B (KENNETH)
>>> <k...@research.att.com> wrote:
>>>> Chris
>>>>
>>>> In answer to your question: my path in was as follows:  I wanted to find a 
>>>> way to implement an LP on a medium-size problem (<~10K variables), and the 
>>>> rest of my code was in perl--so I went looking for an LP implementation in 
>>>> perl.   I was expecting to find a C-compiled module that would do an LP 
>>>> specifically.  I found some instances of that sort of thing, but I also 
>>>> ran across one using PDL.  It didn't do quite what I wanted, but when I 
>>>> saw the PDL site, it was obvious this was something I needed to know 
>>>> about.  I wound up writing my own simplex implementation in PDL to do 
>>>> specifically what I needed, and that worked great--and I was pretty blown 
>>>> away at the speed.  So then I started looking into how I could back up and 
>>>> get the datasets I was dealing with implemented as PDLs to start with.   
>>>> So I've got a good bit of code now using PDL, not just the little simplex 
>>>> program (which was only a few dozen lines--that was pretty easy to 
>>>> implement in PDL.)
>>>>
>>>> I continue to have issues with the documentation, though.  Just as one 
>>>> example from today: the mult function seems to claim that you can get it 
>>>> to operate in-place.  And for me that was important, because I'm dealing 
>>>> with a large dataset (of byte variables).  But, not only does "mult" by 
>>>> itself cause an error because it isn't exported, but when I try to use it 
>>>> as PDL::Ops::Mult(inplace ...), or PDL::Mult(inplace ...), or as 
>>>> $piddle->inplace->mult(...), it completely fails to avoid generating a 
>>>> large intermediate.  That was clobbering my program, repetitively, so I 
>>>> finally punted and decided to break that step up into segments with only a 
>>>> few hundred thousand elements to multiply in each (using slices), and that 
>>>> got me around the problem.  But there was nothing in the documentation 
>>>> that seemed to suggest that would be necessary.  It also seemed, although 
>>>> I didn't document this carefully, that changing the default PDL type 
>>>> didn't have any impact on the size of that temporary intermediate (I think 
>>>> it was using double no matter what I did--whereas using byte would have 
>>>> been fine.)
>>>>
>>>> I'd love it, in this context, if there were a PDL type of "bit" by the 
>>>> way, since that's actually what this problem is using--it's a 3D binary 
>>>> matrix, of ones and zeroes, with up to ~3*10^^9 elements.  When the number 
>>>> of elements goes above ~200M, when I'm using bytes, I have to do things to 
>>>> break it up and process one segment at a time, and it would be nice if 
>>>> that weren't necessary--but there is evidently no implementation of a 
>>>> "bit" type in PDL.
>>>>
>>>> Ken
>>>>
>>>>
>>
>> -----Original Message-----
>> From: Chris Marshall [mailto:devel.chm...@gmail.com]
>> Sent: Saturday, November 15, 2014 11:42 AM
>> To: Derek Lamb
>> Cc: LYONS, KENNETH B (KENNETH); perldl
>> Subject: Re: [Perldl] matching vectors inside a PDL
>>
>> Hi Ken and welcome to the PDL community!
>>
>>> On Nov 14, 2014, at 1:33 PM, LYONS, KENNETH B (KENNETH)
>>> <k...@research.att.com> wrote:
>>>
>>> Yes, most of this I knew, but thanks.  It’s because of that behavior of >
>>> and <, that you mentioned, that I thought that ‘==’ would compare element by
>>> element instead of on the whole vector.
>>>
>>> Have you ever tried, for example, to search the documentation for, say, the
>>> function “list”?  it gives you every occurrence of the word “list” in the
>>> documents (which, needless to say, is rather voluminous, and the first few
>>> hundred entries have nothing to do with the function!)  there should be some
>>> analog of the “man” command in unix that gives you information about the
>>> *function* without all the other garbage.  I think it’s just doing something
>>> akin to a grep thru the documents.
>>
>> In addition to the help/? and apropos/?? in the PDL shells
>> (pdl2 and perldl) there is the command line version pdldoc
>> which can be used starting with 'pdldoc pdldoc' to get the usage.
>>
>> When I do 'pdldoc -a list' I get 45 lines of output all
>> of whose descriptions seem relevant to a general search
>> for something having to do with 'list' including the 'list'
>> command itself.  This type of problem is not specific to
>> PDL as searching the docs for any complicated system
>> or program does tend to produce a large number of
>> incomprehensible and not particularly useful results.
>>
>> It is definitely desired to have smarter and more useful
>> documentation searches.  The ability to add keywords
>> would be nice and is on the feature request list, I believe.
>>
>> If you are on unix-ish system, you might try something
>> like 'pdldoc -a list | grep --color list' to make the output
>> more visually comprehensible.  (We should probably add
>> that to the PDL shell output)
>>
>>> It’s horribly designed in that regard.  The software itself is great, and
>>> I’m very happy with the results, but finding the simplest little thing in
>>> the docs can be a total pain!
>>
>> I understand frustration.  What would really help PDL
>> development would be to know how you got to using
>> PDL without being introduced to the concepts that would
>> have made the learning curve less steep.  (At least you
>> found the mailing list and used it. :-)
>>
>> Happy PDL-ing!
>> Chris

_______________________________________________
Perldl mailing list
Perldl@jach.hawaii.edu
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Re: [Perldl] matching vectors inside a PDL

Reply via email to