[ 
https://issues.apache.org/jira/browse/MAHOUT-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914935#action_12914935
 ] 

Dawid Weiss commented on MAHOUT-253:
------------------------------------

We compared HPPC against a few other collection libraries and think the 
experienced performance gain is due to the following factors:

- a different hashing algorithm in hash map implementation (better key 
distribution); we use murmur hashing as the default hash for keys.

- very small methods (contracts checked via assertions) inline well in loops.

- we could rewrite red-hot performance critical code to use collection buffers 
directly (copying array pointers to local variables, for example).

Some of these performance benchmarks are executed on the build server at:
http://builds.carrot2.org/browse/HPPC-BENCHMARK-40/artifact/Benchmarks-Report

For example this shows various iteration strategies, for instance:
http://builds.carrot2.org/browse/HPPC-BENCHMARK-40/artifact/Benchmarks-Report/com.carrotsearch.hppc.IterationSpeedBenchmark.methods.html

I have a presentation slide comparing performance across various libraries, but 
of course this may be very misleading (the results are heavily architecture and 
JVM dependent). You can grab the code of everything from SVN and run it on your 
machines (hppc-others project contains the bigram counting example).

https://carrot2.svn.sourceforge.net/svnroot/carrot2/labs/hppc/trunk

As for Maven, HPPC is published from our labs server: 
http://repository.carrotsearch.com/labs/releases

All available versions are at:
http://repository.carrotsearch.com/labs/releases/com/carrotsearch/hppc/

> Proposal for high performance primitive collections.
> ----------------------------------------------------
>
>                 Key: MAHOUT-253
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-253
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Utils
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>         Attachments: hppc-1.0-dev.zip
>
>
> A proposal for template-driven collections library (lists, sets, maps, 
> deques), with specializations for Java primitive types to save memory and 
> increase performance. The "templates" are regular Java classes written with 
> generics and certain "intrinsics", that is blocks replaceable by a 
> regexp-preprocessor. This lets one write the code once, immediately test it 
> (tests are also templates) and generate primitive versions from a single 
> source.
> An additional interesting part is the benchmarking subsystem written on top 
> of JUnit ;)
> There are major differences from the Java Collections API, most notably no 
> interfaces and interface-compatible views over sub-collections or key/value 
> sets. These classes also expose their internal implementation (buffers, 
> addressing, etc.) so that the code can be optimized for a particular use case.
> These motivations are further discussed here, together with an API overview.
> http://www.carrot-search.com/download/hppc/index.html
> I am curious what you think about it. If folks like it, Carrot Search will 
> donate the code to Mahout (or Apache Commons-?) and will maintain it (because 
> we plan to use it in our internal projects anyway).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to