Something that I think will be critical, especially if we start
JIT-compiling stuff or allowing for subclassing, is the customized code
could lead to a performance hit if it leads to code cache misses. I
recently came across a great explanation here:
http://igoro.com/archive/gallery-of-processor-cache-effects/

One of the files in the Perl interpreter's core code is called pp_hot.c.
According to comments at the top of the file, these functions are
consolidated into a single c (and later object) file to "encourage CPU
cache hits on hot code." If we create more and more code paths that get
executed, we increase the time spent loading the machine code into the L1
cache, and we also increase the likelihood of evicting parts of pp_hot and
other important execution paths.

David

On Mon, Dec 15, 2014 at 12:45 PM, David Mertens <dcmertens.p...@gmail.com>
wrote:
>
> FWIW, it looks like Julia views are like affine slices in PDL. As I have
> said before, almost nothing out there has the equivalent of non-contiguous,
> non-strided support like we get with which, where, and their ilk. GSL
> vectors do not, either. Matlab only supports it as a temporary object, and
> eliminates it after the line has executed. Not sure about Numpy here.
>
> David
>
> On Mon, Dec 15, 2014 at 11:32 AM, Chris Marshall <devel.chm...@gmail.com>
> wrote:
>
>> > On Sun, Dec 14, 2014 at 11:56 PM, Zakariyya Mughal <
>> zaki.mug...@gmail.com> wrote:
>> >
>> > ...snip...
>> >
>> > ## Levels of measurement
>> >
>> >   When using R, one of the nice things it does is warn or give
>> >   an error when you try to do an operation that would be invalid on a
>> certain
>> >   type of data. One such type of data is categorical data, which R calls
>> >   factors and for which I made a subclass of PDL called PDL::Factor.
>> Some of
>> >   this behvaviour is inspired by the statistical methodology of levels
>> of
>> >   measurement <https://en.wikipedia.org/wiki/Level_of_measurement>. I
>> believe
>> >   SAS even explicitly allows assigning levels of measurment to
>> variables.
>>
>> +1, it would be nice if new PDL types supported varying
>> levels of computation including by levels of measurement
>>
>> > ...snip...
>> >
>> >   `NA` is R's equivalent of `BAD` values. For `mean()` this makes sense
>> for
>> >   categorical data. For logical vectors, it does something else:
>>
>> I would like to see more generalized support for bad value computions
>> since in some cases BAD is used for missing, in others BAD is used
>> for invalid,...
>>
>> > ...snip...
>> >
>> >   Thinking in terms of levels of measurement can help with another
>> experiment
>> >   I'm doing which based around tracking the units of measure used for
>> numerical
>> >   things in Perl. Code is here <
>> https://github.com/zmughal/units-experiment/blob/master/overload_override.pl
>> >.
>> >
>> >   What I do there is use Moo roles to add a unit attribute to numerical
>> types
>> >   (Perl scalars, Number::Fraction, PDL, etc.) and whenever they go
>> through an
>> >   operation by either operator overloading or calling a function such as
>> >   `sum()`, the unit will be carried along with it and be manipulated
>> >   appropriately (you can take the mean of Kelvin, but not degrees
>> Celsius). I
>> >   know that units of measure are messy to implement, but being able to
>> support
>> >   auxiliary operations like this will go a long way to making PDL
>> flexible.
>>
>> Yes!  The use of method modifiers offer some powerful development
>> tools to implement various high level features.  I'm hoping that
>> it can be used to augment core functionality to support many of
>> the more powerful or flexible features such as JIT compiling, GPU
>> computation, distributed computation,...
>> >
>> >   [Has anyone used udunits2? I made an Alien package for it. It's on
>> CPAN.]
>> >
>> > ## DataShape and Blaze
>>
>> This looks a lot like what the PDL::Tiny core is shaping up to be.
>> Another goal of PDL::Tiny is flexibility so that PDL can use and
>> be used by/from other languages.
>>
>> >   I think it would be beneficial to look at the work being done by the
>> Blaze
>> >   project <http://blaze.pydata.org/> with its DataShape specification
>> >   <http://datashape.pydata.org/>. The idea behind it is to be able to
>> use the
>> >   various array-like APIs without having to worry what is going on in
>> the
>> >   backend  be it with a CPU-based, GPU-based, SciDB, or even a SQL
>> server.
>> >
>> > ## Julia
>> >
>> >   Julia has been doing some amazing things with how they've grown out
>> their
>> >   language. I was looking to see if they have anything similar to the
>> dataflow
>> >   in PDL and I came across ArrayViews <
>> https://github.com/JuliaLang/ArrayViews.jl>.
>> >   It may be enlightening to see how they compose this feature onto
>> already
>> >   existing n-d arrays as opposed to how PDL does it.
>> >
>> >   I do not know what tradeoffs that brings, but it is a starting point
>> to think
>> >   about. I think similar approaches can be made to support sparse
>> arrays.
>>
>> Julia views look a lot like what we call slices.
>>
>> >   In fact, one of Julia's strengths is how they use multimethods to
>> handle new
>> >   types with ease. See "The Design Impact of Multiple Dispatch"
>> >   <
>> http://nbviewer.ipython.org/gist/StefanKarpinski/b8fe9dbb36c1427b9f22>
>> >   for examples. [Perl 6 has built-in multimethods]
>>
>> Multi-methods may be a good way to support some of the new PDL
>> capabilities in a way that can be expanded by plugins, at runtime,
>> ...
>>
>>
>> > ## MATLAB subclassing
>> >
>> > ...snip...
>> >
>> > ## GPU and threading
>> >
>> >   I think it would be best to offload GPU support to other libraries,
>> so it
>> >   would be good to extract what is common between libraries like
>> >
>> >   - MAGMA <http://icl.cs.utk.edu/magma/>,
>> >   - ViennaCL <http://viennacl.sourceforge.net/>,
>> >   - Blaze-lib  <https://code.google.com/p/blaze-lib/>,
>> >   - VXL <http://vxl.sourceforge.net/>,
>> >   - Spark <http://spark.apache.org/>,
>> >   - Torch <http://torch.ch/>,
>> >   - Theano <http://www.deeplearning.net/software/theano/>,
>> >   - Eigen <http://eigen.tuxfamily.org/>, and
>> >   - Armadillo <http://arma.sourceforge.net/>.
>> >
>> >   Eigen is interesting in particular because it has support for storing
>> in both
>> >   row-major and column-major data <
>> http://eigen.tuxfamily.org/dox-devel/group__TopicStorageOrders.html>.
>>
>> We would benefit by supporting the commonalities needed to work
>> with other GPU computation libraries.  I'm not sure that all
>> PDL computations can be run efficiently if processed at the
>> library call level.  We may want our own JIT for performnce.
>>
>> >   Another source of inspiration would be the VSIPL spec <
>> http://www.omgwiki.org/hpec/vsipl>.
>> >   It's a standard made for signal processing routines in the embedded
>> DSP world
>> >   and comes with "Core" and "Core Lite" profiles which might help
>> decide what
>> >   should be included in a smaller subset of PDL.
>> >
>> >   Also in my wishlist is interoperability with libraries like ITK <
>> http://www.itk.org/>,
>> >   VTK <http://www.vtk.org/>, and yt <http://yt-project.org/>. They have
>> >   interesting architectures especially for computation. Unfortunately,
>> the
>> >   first two are C++ based and I don't have experience with combining
>> C++ and XS.
>>
>> Thanks for all the references and ideas!
>>
>> > ## Better testing
>> >
>> >   PDL should make more guarantees about how types flow through the
>> system. This
>> >   might be accomplished by adding assertions in the style of
>> Design-by-Contract
>> >   which can act as both a testable spec and documentation. I'm working
>> on the
>> >   test suite right now on a branch and I hope to create a
>> proof-of-concept of
>> >   this soon.
>>
>> I think starting with the PDL::Tiny core and building out we could
>> clarify some of these issues.
>> >
>> >   I hope that this can help make PDL more consistent and easily
>> testable. There
>> >   are still small inconsistencies that shouldn't be there which can be
>> weeded out
>> >   with testing. For example, what type is expected for this code? :
>> >
>> >   ```perl
>> >   use PDL;
>> >   print stretcher( sequence(float, 3) )->type;
>> >   ```
>> >
>> >   I would expect 'float', but it is actually 'double' under PDL
>> v2.007_04.
>>
>> This is a bug.  One thing that would be nice to have is
>> a way to trace the dataflow characteristics through the
>> PDL processing chains...
>>
>>
>> > ## Incremental computation
>> >
>> >   I find that the way I grow my code is to slowly add modules that work
>> >   together in a pipeline. Running and rerunning this code through all
>> the
>> >   modules is slow. To avoid that, I create multiple small programs that
>> read
>> >   and write files to pass from one script to the next. I was looking
>> for a
>> >   solution and came across IncPy <http://www.pgbovine.net/incpy.html>.
>> It
>> >   modifies the Python interpreter to support automatic persistent
>> memoization.
>> >   I don't think the idea has caught on, but I think it should and
>> perhaps Perl
>> >   and PDL is flexible enough to herald it as a CPAN module.
>>
>> Nice idea for improvement and ease of use.  If PDL methods are
>> implemented compatible with Moo[se] then method modifiers could
>> be used for this.
>>
>> Thanks for the thoughts!
>> Chris
>>
>>
>> _______________________________________________
>> Perldl mailing list
>> Perldl@jach.hawaii.edu
>> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>>
>>
>
> --
>  "Debugging is twice as hard as writing the code in the first place.
>   Therefore, if you write the code as cleverly as possible, you are,
>   by definition, not smart enough to debug it." -- Brian Kernighan
>


-- 
 "Debugging is twice as hard as writing the code in the first place.
  Therefore, if you write the code as cleverly as possible, you are,
  by definition, not smart enough to debug it." -- Brian Kernighan
_______________________________________________
Perldl mailing list
Perldl@jach.hawaii.edu
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Reply via email to