[ 
https://issues.apache.org/jira/browse/GIRAPH-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404683#comment-13404683
 ] 

Avery Ching commented on GIRAPH-226:
------------------------------------

I think it's a nice idea in concept, but wondering how much in practice there 
would be a benefit?  Eli, won't the cache only work with Immutable objects?  
LongWritable, FloatWritable, etc are all mutable.  Also, for float values, I 
wouldn't expect too much of an improvement.
                
> Proposal for per-Mapper caching of all Writable values using existing maven 
> imports
> -----------------------------------------------------------------------------------
>
>                 Key: GIRAPH-226
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-226
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Eli Reisman
>            Assignee: Eli Reisman
>            Priority: Minor
>         Attachments: GIRAPH-226-2.patch
>
>
> We already import the Guava library into our Maven repo (see GIRAPH-225) I 
> have written a _proposed_ caching system using their very efficient 
> com.google.common.cache library. It would exist as a static singleton 
> per-Mapper (per JVM) and would be usable by all vertices in a given 
> partition/JVM environment by inclusion of an instance field in 
> BasicVertex<I,V,E,M> or perhaps as part of a Context etc. (something global 
> to each JVM would do.) Through a simple API one could manipulate and 
> "create/get" all Writable instances used by that JVM without duplicating 
> object all the time. The net effect would be similar to the recent 
> improvement to NullWritable, but would cover everything. Please see the 
> patch, it does not attempt to inject this cache into its new home yet, just 
> places the files in "lib/" for your review and comments.
> Experiments to come will reveal whether this is a desperately needed 
> improvement or just a detail as far as Giraph scale-out is concerned, but if 
> it is, here it is. One caveat (I would be happy to make the minimal changes 
> to existing example code/tests and our web instructions) is that the API for 
> using Writables would change slightly. All mutation and creation/aquisition 
> of Writable instances would be via the cache.getWritable(), which is 
> overridden to easily accept all Java types that map to Writables without any 
> work for the user. In fact, using this API would eliminate the need to use 
> the "new" operator with Writables in any way. Best of all, should a new user 
> author an application without using the cache, it would be bloated (as now) 
> memory-wise but would not break in the least. There is a better explanation 
> in the code comments for GiraphWritableCache, the main file.
> One could easily upgrade this to take advantage of generics by using a 
> Configuration object to init this cache, and borrowing its <I,V,E,M> class 
> object for Writable instantiation, but this would require more overhead 
> within the cache itself, and doesn't save much code it turns out because you 
> still have to concretely implement the cache loading methods with concrete 
> type params. Although the main object contains one sub-cache for each 
> Java-to-Writable mapping we use in Giraph/Hadoop, they are instantiated 
> lazily and in most vertex implementations would never be instantiated for 
> more than 1 or 2 of the possible Writables.
> ArrayWritable is not supported yet, I will be posting a separate JIRA about 
> this. It turns out, ArrayWritable does not play nice with GiraphJob.run() no 
> matter how you subclass or manipulate it, and twice now vertex 
> implementations of mine have had to store final values in Text or some other 
> unfortunate format to express tuples. This would make Multigraphs (as is 
> being discussed currently in another Jira by Allessandro) impossible unless 
> fixed. Thanks to Sean Choi for pointing me toward this (I think larger) 
> problem. More to follow.
> Anyway, a quick morning grep reveals no code in Giraph is using 
> ArrayWritables yet anyhow, so for now this doesn't affect the cache. Please 
> look at this code, read the comments about use, and please tell me what you 
> think. NO biggie to be if we don't use it, but again...here it is if we want 
> it. I look forward to hearing from you. 
> For the record, I think it would live as a static field in 
> BasicVertex<I,V,E,M> or GiraphJob, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to