My instinct is you want to start from one of Giraph's higher-level abstractions instead of a raw Hadoop InputFormat.
On Thu, Jul 11, 2013 at 4:47 PM, Renato Marroquín Mogrovejo < [email protected]> wrote: > Hi Armando, > > I really understand what you're saying about the input formats because I am > also writing an integration with Apache Gora and I am facing the same > problems. This is because Gora does not rely directly on Hadoop input > formats but Giraph does. > I think an alternative would be to write an abstraction for input formats > which would have to be agnostic to how data is serialized. In this way, > Giraph could read and write data from any data source without directly > depending on Hadoop's input format. > On the other hand we could extend Hadoop input formats and let them live on > their corresponding modules. IMHO the former option would be a better > choice for extensibility and modularity purposes. > > Renato M. > Hi guys. > > I am currently trying to implement a PoC for the issue GIRAPH-549 (which > btw is the main topic of my GSoC project). > > As suggested in the issue by Claudio I looked at the Faunus > implementation to connect to Rexster and get the data but at the moment > I am overwhelmed by all the available classes. > > My question and doubt is the following: Faunus approach is to create a > InputFormat extending directly from the hadoop InputFormat class. I > however saw that some classes in Giraph extend directly from hadoop > classes while others extend from VertexInputFormat (like > TextVertexInputFormat). So what would be the best choice I could make? I > started extending VertexInputFormat but an opinion from you would be > very appreciated. > > If you need any additional details just let me know. > > Cheers, > Armando >
