[
https://issues.apache.org/jira/browse/GIRAPH-226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eli Reisman updated GIRAPH-226:
-------------------------------
Attachment: (was: GIRAPH-226-1.patch)
> Proposal for per-Mapper caching of all Writable values using existing maven
> imports
> -----------------------------------------------------------------------------------
>
> Key: GIRAPH-226
> URL: https://issues.apache.org/jira/browse/GIRAPH-226
> Project: Giraph
> Issue Type: New Feature
> Reporter: Eli Reisman
> Assignee: Eli Reisman
> Priority: Minor
> Attachments: GIRAPH-226-2.patch
>
>
> We already import the Guava library into our Maven repo (see GIRAPH-225) I
> have written a _proposed_ caching system using their very efficient
> com.google.common.cache library. It would exist as a static singleton
> per-Mapper (per JVM) and would be usable by all vertices in a given
> partition/JVM environment by inclusion of an instance field in
> BasicVertex<I,V,E,M> or perhaps as part of a Context etc. (something global
> to each JVM would do.) Through a simple API one could manipulate and
> "create/get" all Writable instances used by that JVM without duplicating
> object all the time. The net effect would be similar to the recent
> improvement to NullWritable, but would cover everything. Please see the
> patch, it does not attempt to inject this cache into its new home yet, just
> places the files in "lib/" for your review and comments.
> Experiments to come will reveal whether this is a desperately needed
> improvement or just a detail as far as Giraph scale-out is concerned, but if
> it is, here it is. One caveat (I would be happy to make the minimal changes
> to existing example code/tests and our web instructions) is that the API for
> using Writables would change slightly. All mutation and creation/aquisition
> of Writable instances would be via the cache.getWritable(), which is
> overridden to easily accept all Java types that map to Writables without any
> work for the user. In fact, using this API would eliminate the need to use
> the "new" operator with Writables in any way. Best of all, should a new user
> author an application without using the cache, it would be bloated (as now)
> memory-wise but would not break in the least. There is a better explanation
> in the code comments for GiraphWritableCache, the main file.
> One could easily upgrade this to take advantage of generics by using a
> Configuration object to init this cache, and borrowing its <I,V,E,M> class
> object for Writable instantiation, but this would require more overhead
> within the cache itself, and doesn't save much code it turns out because you
> still have to concretely implement the cache loading methods with concrete
> type params. Although the main object contains one sub-cache for each
> Java-to-Writable mapping we use in Giraph/Hadoop, they are instantiated
> lazily and in most vertex implementations would never be instantiated for
> more than 1 or 2 of the possible Writables.
> ArrayWritable is not supported yet, I will be posting a separate JIRA about
> this. It turns out, ArrayWritable does not play nice with GiraphJob.run() no
> matter how you subclass or manipulate it, and twice now vertex
> implementations of mine have had to store final values in Text or some other
> unfortunate format to express tuples. This would make Multigraphs (as is
> being discussed currently in another Jira by Allessandro) impossible unless
> fixed. Thanks to Sean Choi for pointing me toward this (I think larger)
> problem. More to follow.
> Anyway, a quick morning grep reveals no code in Giraph is using
> ArrayWritables yet anyhow, so for now this doesn't affect the cache. Please
> look at this code, read the comments about use, and please tell me what you
> think. NO biggie to be if we don't use it, but again...here it is if we want
> it. I look forward to hearing from you.
> For the record, I think it would live as a static field in
> BasicVertex<I,V,E,M> or GiraphJob, etc.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira