Hi,

I have run into a potential 'for loop' bottleneck. Let me outline:

The following array describes bonds (connections) in a benzene molecule

    b = [[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3,  4, 4, 4, 5,  5, 5, 6, 7,
8, 9, 10, 11],
         [5, 6, 1, 0, 2, 7, 3, 8, 1, 4, 9, 2, 10, 5, 3, 4, 11, 0, 0, 1,
2, 3,  4,  5]]

ie. bond 0 connects atoms 0 and 5, bond 1 connects atom 0 and 6, etc. In
practical examples, the list can be much larger (N > 100.000 connections.

Suppose atoms with indices a = [1,2,3,7,8] are deleted, then all bonds
connecting those atoms must be deleted. I achieve this doing

i_0 = numpy.in1d(b[0], a)
i_1 = numpy.in1d(b[1], a)
b_i = numpy.where(i_0 | i_1)[0]
b = b[:,~(i_0 | i_1)]

If you find this approach lacking, feel free to comment.

This results in the following updated bond list

b = [[0,  0,  4,  4,  5,  5,  5,  6, 10, 11]
     [5,  6, 10,  5,  4, 11,  0,  0,  4,  5]]

This list is however not correct: Since atoms [1,2,3,7,8] have been
deleted, the remaining atoms with indices larger than the deleted atoms
must be decremented. I do this as follows:

for i in a:
    b = numpy.where(b > i, bonds-1, bonds)  (*)

yielding the correct result

b = [[0, 0, 1, 1, 2, 2, 2, 3, 5, 6],
     [2, 3, 5, 2, 1, 6, 0, 0, 1, 2]]

The Python for loop in (*) may easily contain 50.000 iteration. Is there
a smart way to utilize numpy functionality to avoid this?

Thanks and best regards,

Mads

-- 
+---------------------------------------------------------+
| Mads Ipsen                                              |
+----------------------+----------------------------------+
| Gåsebæksvej 7, 4. tv | phone:              +45-29716388 |
| DK-2500 Valby        | email:      mads.ip...@gmail.com |
| Denmark              | map  :   www.tinyurl.com/ns52fpa |
+----------------------+----------------------------------+
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to