Hi Adel, I read through this thread and I'm still not really sure what is being proposed. In the first email in the thread it seems as if you are proposing a wrapper for wavefunctions so that they can be indexed by site. It seems that you want to be able to write:
psi(i).dagger() * syst.hamiltonian(i, j, params) * psi(j) This is a fine proposal. I could imagine that a simple implementation for such a wrapper would not need to use any exotic storage formats, C-level accessors or anything: finalized systems already contains an attribute 'site_ranges' that allows for constant-time access to the orbital offset of a site. I actually proposed such a wrapper some time ago, but my proposal was rejected [1] because it did not give vectorized access to the wavefunction elements. It is not obvious to me how to give vectorized access without copying. Then after this proposal the thread seems to start discussing _LocalOperator (which is an internal detail that was never meant to be extended or user-facing) and lots of details about implementation strategies in C++ for *something*, but it is not clear to me what the end goal is. Perhaps you want to be able to more easily express arbitrary operators? Or are you proposing a refactoring of the kwant operators module to make it more readable? A clarification would be very useful for me. Thanks, Joe [1]: https://gitlab.kwant-project.org/kwant/kwant/merge_requests/47 On 5/14/19 3:14 PM, Adel Kara Slimane wrote: > Hello Christoph, > > I have a few additions to make above my previous mail, I hope I am not > making you waste your time maybe with trivial and unuseful suggestions. > > I have found a C++ library, Armadillo <http://arma.sourceforge.net>, > that can do the trick for implementing a fast accesser, with built-in > matrix-vector products. > > Let's assume we have the wavefunction and the hamiltonian on flat > contiguous arrays. This library has Row, Column, Matrix and > SparseMatrix object implementations, with inner functions like > hermitian conjugate or transpose, and with operations between them > like sum and multiplication. The good thing about it is that one > initialise such objects with pointers to existing memory, without > copying, here are the constructors that one could use: > > For vectors: <http://arma.so! urceforge.net/docs.html#Col> > > |vec(ptr_aux_mem, number_of_elements, copy_aux_mem = true, strict > = false)| > > For matrices <http://arma.sourceforge.net/docs.html#Mat>: > > |mat(ptr_aux_mem, n_rows, n_cols, copy_aux_mem = true, strict = > false)| > > So one can wrap such objects in Cython and stack allocate them with > the correct memory pointers and sizes then use them to write the > wanted operator expression in a compact way. In terms of speed, I > expect it to be nearly as fast as a "by-hand" impl! ementation, since > the expression will be cythonised/compiled, thus in the end with > similar nested for-loops after compilation. > > If I understand well, matrices, for example the hamiltonian, don't > have really a low level representation since "params" need to be > provided first to create a flat matrix of only complex values, and > that would be the point of "_eval_hamiltonian()" in the > "_LocalOperator" class, it returns a BlockSparseMatrix which is > basically a flat array containing all the matrices of all the needed > hoppings, given in the "where" list. I have a question about that: > > Would it be as fast to compute the hamiltonian little by little while > calculating the operator's value on each hopping of the "where" list > ? To show what I mean exactly, I have a code example > <https://gist.github.com/AdelKS/e1dc8ab625c96138ee63e24d0a0ad4f2> (I > couldn't test it yet unfortunately) using the "tinyarray" pythong > package. The idea would then be to use Aramadillo objects instead of > tinyarray, and have the code compilable. This approach would have the > benefit of not using "_eval_hamiltonian()" nor the "BlockSparseMatrix" > class, but will need the extra Armadillo wrapping code, and mapping > the return of "syst.hamiltonian" to a Matrix object. > > > Thank you, > > Adel KARA SLIMANE > > On Mon, 13 May, 2019 at 4:38 PM, Adel Kara Slimane > <[email protected]> wrote: >> Hello Christoph, >> >> Thank you for your! detailed answer. >> >> I convinced myself with a small C++ benchmark code that summing over >> the complex values of a vector<vector<complex<double>>> array is more >> than twice slower than summing over a flat, same size, dynamically >> allocated array (I attached the code to this mail). About the memory >> usage, I don't understand why it would take signifcantly more space, >> but maybe because I first suggested an array of Python objects >> instead of compiled objects, which take more space ? >> >> I agree and I am interested by coding the solution you are >> suggesting, I would like to suggest further developpement to it: >> instead of returning a simple list of 2D arrays without copying >> anything, we use an "accesser" object that would enable ma trix >> products : >> >> 1. It gets initialised with the s! ystem "syst" >> 2. Calling "accesser.wavefunction(bra, site, orbital)" (or some >> similar writing) returns "wavefunction[orbital + offset(site)]" >> with the right offset. >> 3. The matricial inner product is implemented: >> "accesser.wavefunction(bra, site) * accesser.hamiltonian(site) >> * accesser.wavefunction(ket, site)" would be a valid expression >> (or something similar, with the same idea to have an apparent >> matrix-vector or matrix-martix product). >> >> The third bullet point raises for me some questions though: >> >> * What do you think of it ? >> * I need to understand how the hamiltionian is represented to be >> able to think of possibilities to implement that, but I can't >> grasp what is the low level representation of the hamiltonian >> when the system is finalised, is it a flat array too ? When I >> look at "_eval_hamiltonian()" in the class "_LocalOperator". I >> see that it uses the return value of "syst.hamiltonian()", then >> converts it into a ! Python tinyarray.matrix object, then convert >> it once again into a BlockSpareMatrix. >> >> >> Thank you, >> >> Adel >> >> >> >> On Fri, 10 May, 2019 at 6:15 PM, Christoph Groth >> <[email protected]> wrote: >>> Hi Adel, Your understanding of the way wave functions (and other >>> observables) are stored is correct. You are also right that working >>> with low-level observables is inconvenient and could be improved. >>> I'll explain how the current situation came about, and offer a >>> possible solution. As always, we're glad to hear comments or >>> suggestions. I still think that our original design choice to store >>> everything in one flat array was a good one. The alternatives you >>> mention would require much more RAM and would be also significantly >>> slower due to decreased memory locality. If we were to store a list >>> of tinyarrays that would require storing one Python object (the >>> tinyarray) per site. For example, if we were to do this for a system >>> with two orbitals per site, that would mean a memory cost of 8 + 56 >>> = 64 bytes per site. With the current approach, we require two >>> complex numbers, that is 16 bytes per site. What's more, accessing >>> this memory is no longer sequential, which makes it even slower. >>> (You could object that finalized builders already store O(N) Python >>> objects, namely the sites and their tags. But (1) that is not true >>> for general Kwant low-level systems, and (2) we are working on >>> solving this problem for finalized builders as well.) The other >>> solutions that you mention have similar problems. What we could have >>> done is sort the sites according to their number of orbitals, and >>> then for each block of sites store a 2d array of shape (num_sites, >>> num_orbs). But back when the low-level format was established, the >>> number of orbitals of a site was not fixed, and so this could not >>> have been done. The reason why the number was not fixed was not so >>> much a false sense of generality, but rather that we didn't see a >>> good way to specify the number of orbitals per site. We only later >>> had the idea to fix the number of orbitals per site family. (The >>> above solution would have the inconvenience that the wave function >>> would be no longer a simple array, but rather a list of 2d arrays. >>> This exposes more structure, but it also makes things like >>> normalizing a wave function more difficult.) Here is how you can >>> efficiently realize the above solution with a couple of lines of >>> code: Check out the attribute 'site_ranges' of Kwant systems [1]. >>> With this information, you can write a function 'unpack' that takes >>> a system and a numpy array with one entry per orbital (for example a >>> wave function), and returns a list of 2d arrays, that, without >>> copying, refer to the data in a way that simplifies working with >>> sites. If this sounds like a good idea and you write this function, >>> be sure to share it. We might want to include it into Kwant as a >>> method of 'System'. Cheers Christoph [1] >>> https://kwant-project.org/doc/1/reference/generated/kwant.system.System#kwant.system.System >>>
signature.asc
Description: OpenPGP digital signature
