On Dec. 10, 2014, 7:40 p.m., Frank Reininghaus wrote:
> > As for all the appends that you do on a QVector, you might be able to speed 
> > them up significantly if you know the size (which you do).
> > The reason for that potential speedup is append doing some stuff to 
> > determine if it can append or if it needs to grow the vector first (amongst 
> > things). That works very well when you don't know the size, but you don't 
> > need that "overhead" if you do know the size in advance.
> > 
> > My benchmark code is here: https://paste.kde.org/pxly3rryg (1 year lifetime)
> > 
> > The results i get (on - cough - windows, but with mingw. My qt was broken 
> > under linux):
> > "Append took: 278 ms"
> > "More complicated append took: 92 ms"
> > 
> > The above timing are just the insertion. The reserve() and resize() call 
> > are not in them. However, if i run the same benchmark with those calls 
> > included then the timings are as follows:
> > "Append took: 278 ms"
> > "More complicated append took: 101 ms"
> > 
> > The regular append(...) call took 278 ms for 10 million items.
> > The array assignment operator took merely 92 till 101 ms (~3x faster then 
> > append!). That is including a bounds check and using append if you go over 
> > your bounds.
> > I think you can only win speed with this in the avarage usecase.
> > 
> > Note: for this optimization you have to use "resize" on the QVector, not 
> > "reserve".
> > 
> > Disclaimer: i know this is most deffinately a micro optimization, but it's 
> > fun to play with options and see if you can pull out more.
> 
> Milian Wolff wrote:
>     your benchmark is flawed: your second loop also contains branches, which 
> the former does not. Both iterate over the same size, no? So you just want to 
> compare resize + operator[] vs. reserve + append? Yes, the former will be 
> faster, but afaik only slightly. Please write a proper benchmark with 
> QBENCHMARK and use `-perf -perfcounter instr:k`, instead of a bogus QTimer. 
> Also, you sure this was compiled with optimizations enabled?
>     
>     Furthermore, this is also a micro-optimization as the cost of append vs. 
> operator[] is negleglible compared to the cost of the allocations and the 
> rest of whats going on here. I really don't think this is worth it.
> 
> Mark Gaiser wrote:
>     My second loop intentionally contains branches since that's how it will 
> be used in reality. The reason for that is the udsentry being different in 
> size depending on the settings you pass to it. 8 fields is the common 
> usecase. The first loop doesn't contain branches - directly - but does do so 
> indirectly within QVector.
>     
>     "So you just want to compare resize + operator[] vs. reserve + append?"
>     Yes.
>     
>     Perf... :) I know. I disagree that this timer is bogus. QElapsedTimer is 
> good enough to get a rough view if something is faster then something else. 
> Perf helps when fine tuning it. Yes, this was compiled with optimizations in 
> release mode. At least my project was. I don't know for the Qt version 
> (5.4.0). That's just the package with the mingw installer from qt.io.
>     
>     I'm actually not sure if it's worth it either. From a readability point 
> of view it's still quite readable. But i don't know how much difference this 
> makes in real world 100.000 items sized folder (the benchmark for this RR). 
> Not much i think.

https://paste.kde.org/pbdsf39z5


- Milian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://git.reviewboard.kde.org/r/118452/#review71736
-----------------------------------------------------------


On Dec. 9, 2014, 10:44 p.m., Frank Reininghaus wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://git.reviewboard.kde.org/r/118452/
> -----------------------------------------------------------
> 
> (Updated Dec. 9, 2014, 10:44 p.m.)
> 
> 
> Review request for KDE Frameworks and David Faure.
> 
> 
> Repository: kio
> 
> 
> Description
> -------
> 
> I am continuing to split up https://git.reviewboard.kde.org/r/113355/ , which 
> attempts to make UDSEntry more efficient memory and CPU-wise, into 
> independent parts. This is the third step after 
> https://git.reviewboard.kde.org/r/113591/ and 
> https://git.reviewboard.kde.org/r/115739/ .
> 
> The present patch modifies the internal data storage of UDSEntry. UDSEntry 
> contains a mapping from unsigned int keys to "Field" values, where Field is a 
> struct that contains a QString and a long long (some keys correspond to 
> numeric values, like size, date, etc, whereas others, like user and group, 
> correspond to a QString).
> 
> Currently, UDSEntry stores all data in a QHash<uint, Field> internally. This 
> ensures that everything can be accessed in O(1) time, but is not very 
> efficient memory-wise because a separate memory allocation is done for each 
> hash node.
> 
> I propose to change this and store both the uint keys and the Field values in 
> a QVector each. This means that accessing a value becomes a O(n) operation, 
> since the entire QVector of keys may need to be scanned in order to find a 
> value, but because the number n of values in a UDSEntry is usually quite 
> small and can currently not exceed a number ~100, this should not matter in 
> practice.
> 
> Some parts of https://git.reviewboard.kde.org/r/113355/ are still missing:
> 
> (a) The QVectors which store the keys (which are usually the same for all 
> items in a directory) are not shared yet. Changing this would reduce the 
> memory usage further, but I decided to keep this change separate in order to 
> keep the current patch small and easy to understand. Moreover, this makes it 
> easier to benchmark other similar approaches (e.g., replace QVector by 
> std::vector, or store keys and values together in a 
> std::vector<std::pair<uint,Field>>).
> 
> (b) No space is reserved in the vectors when key/value pairs are inserted one 
> by one. Implementing this would make UDSEntry faster on the slave side (since 
> repeated re-allocations would not be necessary any more), but this can be 
> done in a later patch. Moreover, it might not be needed any more if UDSEntry 
> is not used directly any more on the slave side, as suggested on the 
> frameworks mailing list by Aaron (good idea BTW!). 
> 
> 
> Diffs
> -----
> 
>   autotests/udsentry_benchmark.cpp b5fa8d6 
>   src/core/udsentry.h 98a7035 
>   src/core/udsentry.cpp b806e0e 
>   src/ioslaves/file/file.cpp 1a2a767 
>   tests/udsentrybenchmark.cpp d9a118c 
> 
> Diff: https://git.reviewboard.kde.org/r/118452/diff/
> 
> 
> Testing
> -------
> 
> Unit tests still pass.
> 
> The memory usage of listjobtest with a directory with 100,000 files is 
> reduced from 71344 K to 35392 K according to KSysGuard. I see similar savings 
> when opening the directory in Dolphin.
> 
> I still haven't set up a Qt5/KF5 build in release mode (shame on me!), so I 
> cannot present any benchmark results.
> 
> 
> File Attachments
> ----------------
> 
> Benchmark results
>   
> https://git.reviewboard.kde.org/media/uploaded/files/2014/12/09/038e443c-78eb-443b-b33a-b451b28d10ea__UDSEntry-benchmarks.png
> 
> 
> Thanks,
> 
> Frank Reininghaus
> 
>

_______________________________________________
Kde-frameworks-devel mailing list
Kde-frameworks-devel@kde.org
https://mail.kde.org/mailman/listinfo/kde-frameworks-devel

Reply via email to