Reviewers: dak, Message: > In addition, I don't think that it is used to a degree where it would significantly affect LilyPond's performance.
It is not yet. My plan is to plugin this into Grob and Prob and see if there is a measurable speed improvement. If there is none, it's likely that your double indexing scheme will also not bring much. https://codereview.appspot.com/559790043/diff/563830043/input/regression/scheme-unit-test.ly File input/regression/scheme-unit-test.ly (right): https://codereview.appspot.com/559790043/diff/563830043/input/regression/scheme-unit-test.ly#newcode6 input/regression/scheme-unit-test.ly:6: #(ly:test-scheme-hash-table) On 2020/04/10 21:43:28, dak wrote: > This seems like a rather opaque way of testing functionality. how about criticism that is constructive? What do you suggest instead? https://codereview.appspot.com/559790043/diff/563830043/lily/include/scm-hash.hh File lily/include/scm-hash.hh (right): https://codereview.appspot.com/559790043/diff/563830043/lily/include/scm-hash.hh#newcode36 lily/include/scm-hash.hh:36: Entry *table_; On 2020/04/10 21:43:28, dak wrote: > Any reason this is not a C++ array? We don't use C style arrays elsewhere. so we can skip the marking for GUILEV2. https://codereview.appspot.com/559790043/diff/563830043/lily/scm-hash.cc File lily/scm-hash.cc (right): https://codereview.appspot.com/559790043/diff/563830043/lily/scm-hash.cc#newcode59 lily/scm-hash.cc:59: if (old_table[i].key != NULL) On 2020/04/10 21:43:29, dak wrote: > NULL is not a valid SCM value, nor is it guaranteed not to overlap with valid > SCM values that may be stored here. Better use SCM_UNDEFINED here. Also SCM > values should not be compared numerically but by using scm_is_eq . That allows > making sure that there are no integer/SCM confusion (there is a #define one can > set for that purpose) which are a typical source for problems. this would create overhead because scm_gc_calloc allocates data filled with zeroes rather than SCM_xxx. I added some more comment about the assumptions to the header. https://codereview.appspot.com/559790043/diff/563830043/lily/scm-hash.cc#newcode73 lily/scm-hash.cc:73: uintptr_t x = (uintptr_t) (key); On 2020/04/10 21:43:29, dak wrote: > For using the numerical value of an SCM, there is SCM_UNPACK. Please don't use > C style casts. They always succeed, even in cases where they are a very bad > idea. Done. https://codereview.appspot.com/559790043/diff/563830043/lily/scm-hash.cc#newcode81 lily/scm-hash.cc:81: { On 2020/04/10 21:43:28, dak wrote: > This only works for specific cases (uintptr_t == 4 or 8). Instead of creating > our own implementation, how about submitting a patch to Guile upstream? That > way there is better review for Guile-specific problems and we don't have the > maintenance cost in our own code. no. we'd only get the benefits when we move to GUILE 3.2, whenever that is released. https://codereview.appspot.com/559790043/diff/563830043/lily/scm-hash.cc#newcode95 lily/scm-hash.cc:95: if (table_[i].key != NULL) On 2020/04/10 21:43:28, dak wrote: > NULL is not a valid SCM value, and SCM values should be compared using scm_is_eq > rather than !=. Using SCM_UNDEFINED is the proper way of doing things. notice that we're not passing the NULL to GUILE anywhere. https://codereview.appspot.com/559790043/diff/563830043/lily/scm-hash.cc#newcode111 lily/scm-hash.cc:111: *idx = int (hash (key) % cap_); On 2020/04/10 21:43:29, dak wrote: > If the capacity is going to be one less than a power of 2, it would appear to > make sense to use a power of 2 instead and use a mask rather than a modulo > operation. In particular since the hash functions look like they are designed > in a manner where the individual bits are well-distributed. no, that would constrain the table sizes, and therefore the space/time tradeoff we make. https://codereview.appspot.com/559790043/diff/563830043/lily/scm-hash.cc#newcode161 lily/scm-hash.cc:161: if (j <= gap) On 2020/04/10 21:43:28, dak wrote: > This stuff is completely inscrutable and without any comment. It would appear > that it may depend on the capacity being one less than a power of 2 (see above) > but without any analysis and any comment or justification, that is quite > unclear. I added a wikipedia link. (This is standard undergrad CS stuff.) https://codereview.appspot.com/559790043/diff/563830043/lily/scm-hash.cc#newcode170 lily/scm-hash.cc:170: while (next % cap_ != h); On 2020/04/10 21:43:29, dak wrote: > Fall-through without any assertion making sure that stuff worked right? do you have a suggestion about what to assert? https://codereview.appspot.com/559790043/diff/563830043/lily/scm-hash.cc#newcode264 lily/scm-hash.cc:264: class Timestamp On 2020/04/10 21:43:29, dak wrote: > This stuff really has no place in this file. moved to new file. Description: Reimplement Scheme_hash_table using linear probing. Add a unit-test and micro-benchmark, called from input/regression/scheme-unit-test.ly The GUILE hash table implementation uses conflict resolution by chaining. This means that hash lookups involve walking linked lists, which is both relatively slow (the CPU cannot prefetch the next list item), and takes up a lot of space (each {key, value} pair needs an extra cons to form the linked list. The micro-benchmark for lookup shows a 2x speedup compared to GUILE's hashtables. Please review this at https://codereview.appspot.com/559790043/ Affected files (+323, -42 lines): A input/regression/scheme-unit-test.ly M lily/include/scm-hash.hh M lily/scm-hash.cc
