Dear kwant-developers,

I would like to share with you the recent progress in the general Operators 
module. My github repository has been updated: 
https://github.com/PhiReck/generalOperator/ 


Good news first: 
In the last mail I said that the path finding is of O(N^2) or at least O(N 
logN), since for each hopping of the i-th operators we have to find all 
connected hoppings in the (i+1)th operator (connected means hop[1] of i-th 
operator is the same as hop[0] of (i+1)-th operator). However, we can do it in 
O(N), provided accessing a dictionary is O(1): For each Operator we create a 
dictionary whose keys are the site Ids of the where[0]-Sites and the values are 
list of the relative positions in this where. (Note that in the summary 
'genOp.pdf' on github, there was a mistake since it was stated that each 
operator creates the dictionary for the next operator.) For a given site 
related to the i-th operator, we only have to ask the dictionary of the 
(i+1)-th operator (dict[site]) which returns the position of connected hoppings 
of the (i+1)-th operator.
Creating these dictionaries should be of order of the elements in where for 
each operator, which should be O(N_sites*N_Ops), where the number of operators, 
N_Ops, is probably O(1) -- at least at the moment, we need maximally 2-3 
operators. Then we have to go for each hopping in the i-th operator through the 
dictionary of the (i+1)th operator, which should be O(N_sites) as long as 
accessing the dictionary is O(1).
Indeed, I see that for the energy current (psi^\dagger_i H_ij H_jk psi_k), 
where the path finding is necessary, the initialization of the operator is 
linear in system size. More precisely, __init__() takes roughly 2 times longer 
than the system creation in the example I studied the case where each of the 
Hamiltonians is considered over all hoppings in the system. I have not yet 
tried to increase the efficiency for the init, i.e. changing the prefactor for 
the linear dependence on system size, so I guess it might be better than the 
system creation.
For a plot of the data, see on my github: 'init-times-Ecurr.pdf'.


Concerning the progress of the project, I finished including the suggested 
changes by Joe: instead of alternating Onsite-Hopping-Operators, allow any 
combination by having the option of making the product of arbitrary operators. 
By that, the 'where' of each operator is directly connected to the operator 
instead of having an additional where-list, the latter being suggested by my 
very first approach.
Before discussing the efficiencies, let me first comment on a consequence of 
this change -- from alternating operators with separate wherelist and 
individual operators having directly its own 'where' and allowing for products 
of operators -- which I had not anticipated: Before, we only had to find the 
path between hopping-operators, whereas now, an onsite operator might have some 
special 'where' (e.g. to be calculated only in the left half of the system), 
which makes it in general necessary to have also path finding in products with 
onsite operators. In some sense, this is a good development since before, the 
user would have had to generate the wherelist by hand such that the desired 
requirements of the onsite-where are taken into account, whereas now, it is 
done by the code itself. 
However, having easy products of operators like the spin-current (sigma_xyz 
\times Hamiltonian), this path-finding would lead to an unnecessary increase of 
the calculation time. Therefore, I decided to give onsite-operators the boolean 
attribute 'willNotBeCalled' (maybe a better name would be 
'onlyToBeUsedInProduct'?) in the case it is defined over the whole system 
(where=None). This provides the possibility of not creating the 'where=[site 
for site in syst.sites]' but instead, when initializing the product with 
another operator, to use the other operator's where. Most importantly, no paths 
have to be found this way. As mentioned before, this is only possible if the 
onsite operator does not have any 'where'-restrictions because these 
restrictions would be neglected in the approach so far. (If there are 
restrictions in the onsite-where, I don't think that the path-finding can be 
omitted.)

So let's come to the efficiencies that I have tested so far. 
I've shown you previously the operator-__call__() times in the case of the 
density (psi^dagger_i M_i psi_i) for different methods of calculating it 
(kwant, python-recursive function, c-recursive functions,...). Back then, the 
result was that the alternating-operators is roughly *2 times slower* than the 
'specialized' kwant.Density, which is probably due to the fact that the 
recursive functions are still called (1 time for each site).  
Now, I checked the new 'Operator Product'. For some reason that I don't 
understand yet, it is rouglhy *3 times slower* than the kwant.results although 
I don't see a reason why we do not achieve the same speed as with the 
alternating Operators.
For a plot of the data, see on my github: 'call-times-dens.pdf'

Before, I had only checked the __call__-efficiency, now, I also checked the 
__init__() of the operators, again for the Density. First of all, it is indeed 
(somewhat) linear, as expected (no path finding is needed for the  density 
anyway). However, the prefactors are quite different. Compared to the 
kwant-results, the alternating operator is again *2-3 times slower*, and 
unfortunately, the 'Operator Product' is roughly *20-30 times slower* than 
kwant. Again, I don't see a reason why there should be a difference between the 
old (alternating operators) and the new (Op_prods) way, but I have had no time 
yet to consider it more carefully and find the reasons.
However, since the system creation is still *~10 times slower* than the 
__init__() of  'Operator Product', probably the __init__-efficiency is not so 
important anyway, as long as not too many operators are initialized.
For a plot of the data, see on my github: 'init-times-dens.pdf'


In future, I would like to try to use c++ maps in cython in order to avoid 
having the confusing auxiliary lists made from the python dict. Moreover, I 
would like to understand why there is a difference in the efficiency between 
the old (alternating Ops) and the new (Op Prods) way and enhance the efficiency 
if possible.

Concerning the efficiencies, I would like to hear your opinions. Of course, in 
the best case, the old operators (Density, Current, Source) should be as fast 
with the generalized Operator class, as with the specialized kwant.Operators, 
both in calculation and initialization. However, I guess that this will not be 
perfectly possible. Still, since creating the system as well as calculating the 
scattering states is in most cases anyway heavier than the operator, this loss 
in efficiency might be unimportant compared to the benefit of the generalized 
Operator: creating easily new operators (see the beginning of 
generalOperator.pyx on my github for examples of examples like the Density, 
Current, etc.) and having the calculation routines only once (so far, each of 
the Density, Current and Source have their own _operate(), although its very 
similar in all cases). 
I would be very interested in what you think what an acceptable loss in 
efficiency is, or in which cases, the efficiency is of major importance.



To finish, I have a technical question. So far, for each __call__ of the 
operator, _get_orbs(..) is called in _eval_operator(..) to get the number of 
orbitals and starting index of the wave function related to a given site. 
However, I expect that for a finalized systems, these quantities cannot change 
(e.g. there cannot be a parameter which, when changed in value, would change 
the ordering of the wave function or the number of orbitals on a site), right? 
If this is the case, _get_orbs could be done already in the __init__. This 
might be a minor win in efficiency.


As always, I am looking forward to your comments and suggestions. I assume that 
I might not be always 100% clear so if there is any confusion, please let me 
know.
Best regards,
Phillipp
 

Reply via email to