Re: Reimplement Scheme_hash_table using linear probing. (issue 559790043 by [email protected])

hanwenn Fri, 10 Apr 2020 22:38:26 -0700

Reviewers: dak,

Message:
>  In addition, I don't think that it is used to a degree where it would
significantly affect LilyPond's performance.


It is not yet. 

My plan is to plugin this into Grob and Prob and see if there is a
measurable speed improvement. If there is none, it's likely that your
double indexing scheme will also not bring much.


https://codereview.appspot.com/559790043/diff/563830043/input/regression/scheme-unit-test.ly
File input/regression/scheme-unit-test.ly (right):

https://codereview.appspot.com/559790043/diff/563830043/input/regression/scheme-unit-test.ly#newcode6
input/regression/scheme-unit-test.ly:6: #(ly:test-scheme-hash-table)
On 2020/04/10 21:43:28, dak wrote:
> This seems like a rather opaque way of testing functionality.

how about criticism that is constructive? What do you suggest instead?

https://codereview.appspot.com/559790043/diff/563830043/lily/include/scm-hash.hh
File lily/include/scm-hash.hh (right):

https://codereview.appspot.com/559790043/diff/563830043/lily/include/scm-hash.hh#newcode36
lily/include/scm-hash.hh:36: Entry *table_;
On 2020/04/10 21:43:28, dak wrote:
> Any reason this is not a C++ array?  We don't use C style arrays
elsewhere.

so we can skip the marking for GUILEV2.

https://codereview.appspot.com/559790043/diff/563830043/lily/scm-hash.cc
File lily/scm-hash.cc (right):

https://codereview.appspot.com/559790043/diff/563830043/lily/scm-hash.cc#newcode59
lily/scm-hash.cc:59: if (old_table[i].key != NULL)
On 2020/04/10 21:43:29, dak wrote:
> NULL is not a valid SCM value, nor is it guaranteed not to overlap
with valid
> SCM values that may be stored here.  Better use SCM_UNDEFINED here. 
Also SCM
> values should not be compared numerically but by using scm_is_eq . 
That allows
> making sure that there are no integer/SCM confusion (there is a
#define one can
> set for that purpose) which are a typical source for problems.

this would create overhead because scm_gc_calloc allocates data filled
with zeroes rather than SCM_xxx. I added some more comment about the 
assumptions to the header.

https://codereview.appspot.com/559790043/diff/563830043/lily/scm-hash.cc#newcode73
lily/scm-hash.cc:73: uintptr_t x = (uintptr_t) (key);
On 2020/04/10 21:43:29, dak wrote:
> For using the numerical value of an SCM, there is SCM_UNPACK.  Please
don't use
> C style casts.  They always succeed, even in cases where they are a
very bad
> idea.

Done.

https://codereview.appspot.com/559790043/diff/563830043/lily/scm-hash.cc#newcode81
lily/scm-hash.cc:81: {
On 2020/04/10 21:43:28, dak wrote:
> This only works for specific cases (uintptr_t == 4 or 8).  Instead of
creating
> our own implementation, how about submitting a patch to Guile
upstream?  That
> way there is better review for Guile-specific problems and we don't
have the
> maintenance cost in our own code.

no. we'd only get the benefits when we move to GUILE 3.2, whenever that
is released.

https://codereview.appspot.com/559790043/diff/563830043/lily/scm-hash.cc#newcode95
lily/scm-hash.cc:95: if (table_[i].key != NULL)
On 2020/04/10 21:43:28, dak wrote:
> NULL is not a valid SCM value, and SCM values should be compared using
scm_is_eq
> rather than !=.  Using SCM_UNDEFINED is the proper way of doing
things.

notice that we're not passing the NULL to GUILE anywhere.

https://codereview.appspot.com/559790043/diff/563830043/lily/scm-hash.cc#newcode111
lily/scm-hash.cc:111: *idx = int (hash (key) % cap_);
On 2020/04/10 21:43:29, dak wrote:
> If the capacity is going to be one less than a power of 2, it would
appear to
> make sense to use a power of 2 instead and use a mask rather than a
modulo
> operation.  In particular since the hash functions look like they are
designed
> in a manner where the individual bits are well-distributed.

no, that would constrain the table sizes, and therefore the space/time
tradeoff we make.

https://codereview.appspot.com/559790043/diff/563830043/lily/scm-hash.cc#newcode161
lily/scm-hash.cc:161: if (j <= gap)
On 2020/04/10 21:43:28, dak wrote:
> This stuff is completely inscrutable and without any comment.  It
would appear
> that it may depend on the capacity being one less than a power of 2
(see above)
> but without any analysis and any comment or justification, that is
quite
> unclear.

I added a wikipedia link. (This is standard undergrad CS stuff.)

https://codereview.appspot.com/559790043/diff/563830043/lily/scm-hash.cc#newcode170
lily/scm-hash.cc:170: while (next % cap_ != h);
On 2020/04/10 21:43:29, dak wrote:
> Fall-through without any assertion making sure that stuff worked
right?

do you have a suggestion about what to assert?

https://codereview.appspot.com/559790043/diff/563830043/lily/scm-hash.cc#newcode264
lily/scm-hash.cc:264: class Timestamp
On 2020/04/10 21:43:29, dak wrote:
> This stuff really has no place in this file.

moved to new file.

Description:
Reimplement Scheme_hash_table using linear probing.

Add a unit-test and micro-benchmark, called from
input/regression/scheme-unit-test.ly

The GUILE hash table implementation uses conflict resolution by
chaining. This means that hash lookups involve walking linked lists,
which is both relatively slow (the CPU cannot prefetch the next list
item), and takes up a lot of space (each {key, value} pair needs an
extra cons to form the linked list.

The micro-benchmark for lookup shows a 2x speedup compared to GUILE's
hashtables.

Please review this at https://codereview.appspot.com/559790043/

Affected files (+323, -42 lines):
  A input/regression/scheme-unit-test.ly
  M lily/include/scm-hash.hh
  M lily/scm-hash.cc

Re: Reimplement Scheme_hash_table using linear probing. (issue 559790043 by [email protected])

Reply via email to