[jira] Commented: (MAHOUT-253) Proposal for high performance primitive collections.

Dawid Weiss (JIRA) Sat, 25 Sep 2010 10:59:58 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914849#action_12914849
 ]


Dawid Weiss commented on MAHOUT-253:
------------------------------------

I was thinking about it myself: showing that HPPC has an advantage over colt 
collections in the real context would prove the integration makes sense. Just 
throwing in (a lot) of generated code into Mahout and have Colt and HPPC 
collection classes duplicate each other functionality (even taking into account 
the difference in design and compatibility with JUC) makes little sense.

Let's mark this issue invalid; reading the mailing list it seems the network/ 
Hadoop overhead is currently way larger than number crunching/ data structures. 
If anybody needs efficient collections, HPPC is still out there and the Apache 
licensed, so no major headache to import it into one's project.

> Proposal for high performance primitive collections.
> ----------------------------------------------------
>
>                 Key: MAHOUT-253
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-253
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Utils
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>         Attachments: hppc-1.0-dev.zip
>
>
> A proposal for template-driven collections library (lists, sets, maps, 
> deques), with specializations for Java primitive types to save memory and 
> increase performance. The "templates" are regular Java classes written with 
> generics and certain "intrinsics", that is blocks replaceable by a 
> regexp-preprocessor. This lets one write the code once, immediately test it 
> (tests are also templates) and generate primitive versions from a single 
> source.
> An additional interesting part is the benchmarking subsystem written on top 
> of JUnit ;)
> There are major differences from the Java Collections API, most notably no 
> interfaces and interface-compatible views over sub-collections or key/value 
> sets. These classes also expose their internal implementation (buffers, 
> addressing, etc.) so that the code can be optimized for a particular use case.
> These motivations are further discussed here, together with an API overview.
> http://www.carrot-search.com/download/hppc/index.html
> I am curious what you think about it. If folks like it, Carrot Search will 
> donate the code to Mahout (or Apache Commons-?) and will maintain it (because 
> we plan to use it in our internal projects anyway).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-253) Proposal for high performance primitive collections.

Reply via email to