Hi, yes, let's discuss this, maybe off-line from this list to get to the point quickly.
I'll be also happy to help (if I can). Sebastian On 11/18/2015 01:01 AM, Michael Joyce wrote: > I would be happy to help with this (although that's probably obvious from the > above email), but I > realize we'll probably want to chat a bit about this. It's certainly not a > small change =) > > > -- Jimmy > > On Sat, Nov 14, 2015 at 2:28 AM, Lewis John Mcgibbney > <[email protected] > <mailto:[email protected]>> wrote: > > Hi Folks, > > Mike Joyce and myself have been working on a Tinkerpop implementation of > Node and NodeDB > (generated through WebGraph) which builds a Vertex input, used by > Tinkerpop, subsequently > Gremlin and persisted into a graph database such as TitanDB. > We have analyzed the problem quite a bit and came across the following > I/O formats > > http://tinkerpop.incubator.apache.org/docs/3.0.1-incubating/#script-io-format > I've implemented a PropertyWebGraphVertex writable in Nutch which builds > off of NodeDB (and > others) to enable us to write out to the ScriptOutputFormat. Essentially > we address the issues > of parent child Vs child parent e.g. Outlinks Vs Inlinks respectively. > The work from there then consists of an external process (to Nutch) > invoking a Groovy script > from within Gremlin to ingest data into TitanDB. > During the course of this work we have realized that mapred and mapreduce > API's are NOT ok > within trunk if we want to move Nutch to accommodate the above described > architecture. > > Breath of fresh air and a deep breath... > > What do you guys think about branching trunk into a 3.X branch with every > mapred --> mapreduce > package addressed. > Mike, Sujen and myself talked today. We want to touch base with everyone > within dev@ as it lends > itself very much to the work undertaken by > https://issues.apache.org/jira/browse/NUTCH-2097 > > It does not however totally rearrange the codebase. It will however > generate a genuine graph > output based upon > > http://tinkerpop.incubator.apache.org/docs/3.0.1-incubating/#script-io-format > We can have a gremlin script as part of $NUTCH_HOME/conf which merely > ingests data (along with a > config file) to a GraphDB such as Titan. > > What does everyone think? > Thanks > Lewis > > -- > /Lewis/ > >

