Thanks for the reviews! Series pushed, comments below,
Jarno On Oct 6, 2014, at 11:52 AM, Ben Pfaff <b...@nicira.com> wrote: > On Wed, Sep 24, 2014 at 11:31:47AM -0700, Jarno Rajahalme wrote: >> Batching the cmap find improves the memory behavior with large cmaps >> and can make searches twice as fast: >> >> $ tests/ovstest test-cmap benchmark 2000000 8 0.1 16 >> Benchmarking with n=2000000, 8 threads, 0.10% mutations, batch size 16: >> cmap insert: 533 ms >> cmap iterate: 57 ms >> batch search: 146 ms >> cmap destroy: 233 ms >> >> cmap insert: 552 ms >> cmap iterate: 56 ms >> cmap search: 299 ms >> cmap destroy: 229 ms >> >> hmap insert: 222 ms >> hmap iterate: 198 ms >> hmap search: 2061 ms >> hmap destroy: 209 ms >> >> Batch size 1 has small performance penalty, but all other batch sizes >> are faster than non-batched cmap_find(). The batch size 16 was >> experimentally found better than 8 or 32, so now >> classifier_lookup_miniflow_batch() performs the cmap find operations >> in batches of 16. >> >> Signed-off-by: Jarno Rajahalme <jrajaha...@nicira.com> > > Here, I think that it would be possible to use __builtin_constant_p() > to inline operations on "long"-size bitmaps, which would allow users > to get automatic benefit without having to be aware of the size of the > map. I don't know whether it's important enough though: >> +/* More efficient access to a map of single ulong. */ >> +#define ULONG_FOR_EACH_1(IDX, MAP) \ >> + for (unsigned long map__ = (MAP); \ >> + map__ && (((IDX) = raw_ctz(map__)), true); \ >> + map__ = zero_rightmost_1bit(map__)) >> + >> +#define ULONG_SET0(MAP, OFFSET) ((MAP) &= ~(1UL << (OFFSET))) >> +#define ULONG_SET1(MAP, OFFSET) ((MAP) |= 1UL << (OFFSET)) > You mean folding the above to the existing bitmap API and relying on __builtin_constant_p() to inline the case where the bitmap size is within a “long”-size. I think it should work, but on MSVC it would be much slower, unless there is an equivalent builtin there. > It's nice, when it's possible, to allow callers to use any batch size > they want and then loop internally as necessary to handle very large > batches. I don't know whether you considered that approach for > classifier_lookup_miniflow_batch(), instead of defining > CLASSIFIER_MAX_BATCH. In this case, the actual restriction is only > for Windows compilation, so we risk bugs that only show up on Windows. > (I guess that the easy way to avoid that risk is to include the > assertion > ovs_assert(n_maps <= CLASSIFIER_MAX_BATCH); > on all platforms instead of just Windows.) > > On the other hand, limiting the batch size to the size of an unsigned > long makes sense to me in the classifier_lookup_miniflow_batch() > interface. > The only user (dpif-netdev) defines a batch size in the range of 256, hence the longer than “ulong”-size max batch here. However, my other patch series moves this function to dpif-netdev, so this issue goes away :-) > map_type and MAP_BITS are declared in an IMO awkward spot between the > function explanatory comment and the function declaration. I would > move them inside the function code block. > Done! > Here, I think that the Windows version should declare maps[] to have > DIV_ROUND_UP(CLASSIFIER_MAX_BATCH, MAP_BITS) elements, rather than > CLASSIFIER_MAX_BATCH elements: > > const int n_maps = DIV_ROUND_UP(cnt, MAP_BITS); > > #if !defined(__CHECKER__) && !defined(_WIN32) > map_type maps[n_maps]; > #else > map_type maps[CLASSIFIER_MAX_BATCH]; > ovs_assert(n_maps <= CLASSIFIER_MAX_BATCH); > #endif > Thanks. > Why does a batch have 16 lookups instead of CHAR_BIT * sizeof(unsigned > long)? It's not obvious and the comment doesn't say. Oh, I see, it's > in the commit message, will you please copy that into the batch size > comment? > Done. > This is intricate code. I spent some time reading it carefully. I > did not find any bugs. > Lucky me (keeping fingers crossed)... > Thank you! > > Acked-by: Ben Pfaff <b...@nicira.com> _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev