Hi,

yes, let's discuss this, maybe off-line from this list
to get to the point quickly.

I'll be also happy to help (if I can).

Sebastian

On 11/18/2015 01:01 AM, Michael Joyce wrote:
> I would be happy to help with this (although that's probably obvious from the 
> above email), but I
> realize we'll probably want to chat a bit about this. It's certainly not a 
> small change =)
> 
> 
> -- Jimmy
> 
> On Sat, Nov 14, 2015 at 2:28 AM, Lewis John Mcgibbney 
> <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     Hi Folks,
> 
>     Mike Joyce and myself have been working on a Tinkerpop implementation of 
> Node and NodeDB
>     (generated through WebGraph) which builds a Vertex input, used by 
> Tinkerpop, subsequently
>     Gremlin and persisted into a graph database such as TitanDB.
>     We have analyzed the problem quite a bit and came across the following 
> I/O formats
>     
> http://tinkerpop.incubator.apache.org/docs/3.0.1-incubating/#script-io-format
>     I've implemented a PropertyWebGraphVertex writable in Nutch which builds 
> off of NodeDB (and
>     others) to enable us to write out to the ScriptOutputFormat. Essentially 
> we address the issues
>     of parent child Vs child parent e.g. Outlinks Vs Inlinks respectively.
>     The work from there then consists of an external process (to Nutch) 
> invoking a Groovy script
>     from within Gremlin to ingest data into TitanDB.
>     During the course of this work we have realized that mapred and mapreduce 
> API's are NOT ok
>     within trunk if we want to move Nutch to accommodate the above described 
> architecture.
> 
>     Breath of fresh air and a deep breath...
> 
>     What do you guys think about branching trunk into a 3.X branch with every 
> mapred --> mapreduce
>     package addressed.
>     Mike, Sujen and myself talked today. We want to touch base with everyone 
> within dev@ as it lends
>     itself very much to the work undertaken by 
> https://issues.apache.org/jira/browse/NUTCH-2097
> 
>     It does not however totally rearrange the codebase. It will however 
> generate a genuine graph
>     output based upon
>     
> http://tinkerpop.incubator.apache.org/docs/3.0.1-incubating/#script-io-format
>     We can have a gremlin script as part of $NUTCH_HOME/conf which merely 
> ingests data (along with a
>     config file) to a GraphDB such as Titan.
> 
>     What does everyone think?
>     Thanks
>     Lewis
> 
>     -- 
>     /Lewis/
> 
> 

Reply via email to