> On Oct. 21, 2013, 8:56 a.m., Milian Wolff wrote: > > kio/kio/udsentry.cpp, line 51 > > <http://git.reviewboard.kde.org/r/113355/diff/1/?file=203549#file203549line51> > > > > have you tried with a QMap? It has a lower memory overhead than QHash. > > And for few items (which is the case here, if I understood correctly), it > > will also be faster than QHash. And thus probably also faster than the > > combined QHash + QVector lookup. > > > > The code would also become cleaner that way.
Thanks for your comments! I have not tried QMap yet. I can do it when I'm at home and I find some time, but I would guess that dropping the implicit sharing idea and using a QMap<uint, Field> (which is what you propose if I understand you correctly) will require more memory than my proposal. Note that the entire information about the uint keys and the positions in the QVector<Field> that they correspond to is effectively stored in a single pointer per UDSEntry (because the QHash<uint, int> is implicitly shared). On the other hand, a QMap needs a considerable amount of overhead to store that information (even if it might be lower than for QHash). http://doc.qt.digia.com/qq/qq19-containers.html shows that we need at the very least two pointers (backward and forward[0]) per node, plus the overhead that the memory allocator adds to every allocation (BTW, this overhead does not show up when using massif and your extremely cool massif-visualizer). About the performance: this is a very valid question, of course. I have only tested how my patch affects KIO::ListJob so far with a simple QElapsedTimer and found that it becomes about 30% faster, but I will try to write a benchmark that test all typical uses of UDSEntry inside the kioslaves (storing the strings and numbers in the UDSEntries, and then writing these to a QDataStream) and the applications (loading the UDSEntries from the stream, and accessing the strings and numbers). - Frank ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://git.reviewboard.kde.org/r/113355/#review42058 ----------------------------------------------------------- On Oct. 20, 2013, 5:27 p.m., Frank Reininghaus wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > http://git.reviewboard.kde.org/r/113355/ > ----------------------------------------------------------- > > (Updated Oct. 20, 2013, 5:27 p.m.) > > > Review request for kdelibs. > > > Repository: kdelibs > > > Description > ------- > > This patch is based on some discussions on the kfm-devel mailing list, see > http://lists.kde.org/?t=137935784800003&r=1&w=2 > > Mark found out that KIO's UDSEntry class is one of the major consumers of > memory in applications which use KIO to list directories with a large number > of files, and I found a way use implicit sharing to drastically reduce the > amount of memory it needs. Many thanks to Milian for his great blog post > http://milianw.de/blog/katekdevelop-sprint-vienna-2012-take-1, without which > I would probably not have had such ideas. > > > 1. The problem > > The UDSEntry keeps all sorts of information about files which can be stored > in a string (name, user, group, etc.) or in a long long (file size, > modification time, etc.). All these data can be accessed with a uint key, and > UDSEntry returns the corresponding QString or long long in O(1) time. > Internally, it stores the data in a QHash<uint, Field>, where Field is a > struct that has a QString and a long long member. > > The problem is that QHash needs quite a lot of memory to provide the O(1) > access, see http://doc.qt.digia.com/qq/qq19-containers.html for details, and > that the minimum capacity of a QHash seems to be 17, even though the number > of entries in a UDSEntry is often 8 in the rather typical standard kio_file > use case. > > > 2. Proposed solution > > (a) We can store the "Fields" in a QVector<Field>, which needs only very > little overhead compared to the memory that the actual "Fields" need. When > loading a UDSEntry from a QDataStream, we just append all "Fields" to this > QVector one by one. Moreover, we need a QHash<uint, int>, which maps each key > to the index of its Field in the QVector. This restructuring alone does not > reduce the memory usage, of course. > > (b) The key observation is that the QDataStream, which KIO::ListJob reads the > UDSEntries from, typically provides the different "Fields" in exactly the > same order. This means that the QHash<uint, int> is usually exactly the same > for every UDSEntry, and we can make use of implicit sharing to store only one > copy of this QHash. I've modified > > void UDSEntryPrivate::load(QDataStream &s, UDSEntry &a) > > such that it remembers the most recent QHash<uint, int> and just adds an > implicitly shared copy of it to "a" if the order of the Fields has not > changed. > > (c) Moreover, some of the QString Fields in the UDSEntries in one directory > are often the same, like, e.g., the user and the group. The "load" function > also remembers the most recently read values for each Field in a static > QVector<QString> and just puts an implicitly shared copy into the UDSEntry if > possible. > > > 3. Possible disadvantages > > (a) When using the "remove" member, the new version of UDSEntry does not > remove the Field from the QVector<Field>. This means that removing and adding > a "Field" repeatedly would let the memory usage grow indefinitely. According > to David (http://lists.kde.org/?l=kfm-devel&m=138052519927973&w=2), this > doesn't matter though because no known user of UDSEntry uses its remove() > member. Maybe we should remove remove (pun stolen grom David) in the > frameworks branch then? > > (b) In principle, the old version of UDSEntryPrivate::load(QDataStream&, > UDSEntry&) was reentrant. This is not the case for my changed version. > Reentrancy could be restored rather easily by protecting the access to the > static data with a mutex, but given that most of KIO is not supposed to be > used from outside the main thread AFAIK, I don't know if this is necessary. > > > 4. Changes since the first version of the patch which I posted in > http://lists.kde.org/?l=kfm-devel&m=137962995805432&w=2 > > (a) Implemented the minor changes suggested by David in > http://lists.kde.org/?l=kfm-devel&m=137975442807965&w=2 > > (b) Added a unit test to verify that storing and loading UDSEntries from a > stream works even if the order of the fields is permuted, and some fields are > removed or added in between. > > (c) Fixed a bug which was uncovered by the test: > cachedUdsFields.erase(cachedUdsFields.begin() + i, cachedUdsFields.end()) > instead of cachedUdsFields.erase(cachedUdsFields.begin() + i) > > (d) Use QVector::reserve to reserve the appropriate size for the > QVector<Field>. Saves some time when loading the UDSEntry, and reduces the > memory usage further. > > (e) Changed the type of the loop variable from quint32 to int to fix some > compiler warnings. > > > Diffs > ----- > > kio/kio/udsentry.cpp 1e1f503 > kio/tests/CMakeLists.txt 1019312 > kio/tests/udsentrytest.h PRE-CREATION > kio/tests/udsentrytest.cpp PRE-CREATION > > Diff: http://git.reviewboard.kde.org/r/113355/diff/ > > > Testing > ------- > > Old and new unit tests pass. The memory usage of Dolphin when loading a > directory with 100,000 files in Icons View is reduced from 165.4 MB to 113.3 > MB. Any application that uses a file dialog, a KDirLister, or anything else > that uses a KIO::ListJob to list directory contents should benefit from > similar savings. > > > Thanks, > > Frank Reininghaus > >
