A Thursday 23 July 2009 21:17:17 escriguéreu:
> On Wed, Jul 22, 2009 at 5:11 AM, Francesc Alted<fal...@pytables.org> wrote:
> > A Tuesday 21 July 2009 23:07:28 Kenneth Arnold escrigué:
> >> (Use case: we have a read-only matrix that's an array of vectors.
> >> Given a probe vector, we want to find the top n vectors closest to it,
> >> measured by dot product. numpy's dot function does exactly what we
> >> want. But this runs in a multiprocess server, and these matrices are
> >> largeish, so I thought memmap would be a good way to let the OS handle
> >> sharing the matrix between the processes.)
> >>
> >> (Array _columns_ are stored contiguously, right?)
> >
> > No.  PyTables (as HDF5, NumPy and all the C world in general) stores the
> > arrays in a row-wise manner, not column-wise (like for example Matlab
> > which is based on Fortran).
>
> I meant that within a row, entries from different columns are stored
> contiguously. So we vehemently agree, though your terminology is
> probably better.
>
> > Well, if done properly, I/O in PyTables should not take much more than
> > numpy.memmap (in fact, it can be faster in many occasions).  You just
> > need to read/write arrays following the contiguous direction, i.e. the
> > most to the right among those orthogonal to the 'main' dimension (in
> > PyTables jargon).
>
> I think we are.
>
> The slow part of our code is a single line of Python:
> `numpy.dot(matrix, vector)`. `matrix` is a tall (thousands by 50)
> PyTables Array or CArray. `vector` is a row sliced out of it (1 by
> 50). PyTables performs extremely poorly in this case (70 seconds
> compared to 0.04 seconds for NumPy ndarrays). Here is the profiler
> output (%prun), Python 2.5, NumPy 1.3:
>
>   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>   285592   25.140    0.000   25.140    0.000 {method '_g_readSlice' of
> 'tables.hdf5Extension.Array' objects}
>   285592   15.211    0.000   27.501    0.000
> array.py:407(_interpret_indexing) 285592    3.439    0.000   66.873   
> 0.000 array.py:486(__getitem__) 285592    3.164    0.000    8.295    0.000
> leaf.py:425(_processRange) 856776    2.902    0.000    4.487    0.000
> utils.py:66(idx2long) 1    2.876    2.876   69.749   69.749
> {numpy.core.multiarray.dot} 285592    2.719    0.000   27.859    0.000
> array.py:565(_readSlice) 1142368    2.257    0.000    2.257    0.000
> utils.py:44(is_idx)
>  1427963    1.782    0.000    1.782    0.000 {len}
>   285592    1.636    0.000    4.492    0.000 flavor.py:344(conv_to_numpy)
>   285592    1.261    0.000    1.261    0.000 numeric.py:180(asarray)
>   285592    1.232    0.000    5.724    0.000
> flavor.py:110(array_of_flavor2) 571186    1.130    0.000    1.130    0.000
> {isinstance}
>   285592    1.069    0.000    2.329    0.000
> flavor.py:353(_conv_numpy_to_numpy) 285592    0.920    0.000    6.644   
> 0.000 flavor.py:130(flavor_to_flavor) 285592    0.896    0.000    7.540   
> 0.000 flavor.py:150(internal_to_flavor) 285592    0.644    0.000    0.644  
>  0.000 {tables.utilsExtension.getIndices} 285592    0.535    0.000    0.535
>    0.000 leaf.py:243(<lambda>) 285593    0.527    0.000    0.527    0.000
> {hasattr}
>   285592    0.411    0.000    0.411    0.000 {method 'append' of 'list'
> objects} 1    0.000    0.000   69.750   69.750 labeled_view.py:242(dot) 5  
>  0.000    0.000    0.000    0.000 array.py:123(_getnrows) 1    0.000   
> 0.000    0.000    0.000 labeled_view.py:93(__init__)
>
> (labeled_view.py is the only part that's our code, and it only calls
> numpy.core.multiarray.dot once.)

Yes, this kind of performance is expected.  The problem here is that you are 
treating PyTables objects as if they where regular NumPy matrices, but they 
are fundamentally different beasts.  In particular, accessing a single element 
of an array is much slower with PyTables because of several reasons, but the 
most important one is that you have to go through several layers (indexing 
layers, HDF5 layer, filesystem layer...) in order to get a single element and 
this is *slow*.

To overcome this what you need is to *block* the access to your data so that 
you can retrieve a significant amount of data (a block) on each access.  This 
is the path that follows the recently introduced `tables.Expr` in 2.2b1.  
However, `tables.Expr` is mostly for operating arrays element-wise, and hence 
it does not support linear algebra operations yet.  As I've said in other 
message, you could implement yourself a *blocked* algorithm for doing matrix-
matrix multiplications on top of PyTables for getting maximum performance.  By 
using this approach I'd say that you could get a similar (if not better) 
performance than with numpy.memmap.

For example, Ulrich Drepper explains how to get good performance for matrix-
matrix multiplication with today's computers by using memory block access:

http://people.redhat.com/drepper/summit08-memperf.pdf

This is for operating with data in-memory, but the technique is the same for 
data on-disk.  For a general introduction to the blocking technique, you may 
want to have a look at my presentation in the last EuroSciPy conference:

http://www.euroscipy.org/presentations/slides/slides_alted.pdf

Hope this helps,

-- 
Francesc Alted

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to