Hi Rich,

if you are a bit familiar with Java you can also use the batch-inserter API 
yourself to implement the things you need.

This also applies to other JVM languages too, like JRuby, Jython, Javascript 
Scala, Clojure, Groovy etc.


Am 12.05.2014 um 09:49 schrieb Rich Morin <[email protected]>:

> I need to import ~300 million RDF triples from YAGO2s, a mechanically-
> generated ontology.  The Batch Importer (preferably the 2.0 version)
> is an obvious candidate for this task, if I can figure out some pesky
> usage details.  Help?
> 
> -r
> 
> Background
> 
>  If a triple defines a relation between subject and object URIs, I
>  can express it as a Neo4j relationship.
> 
>  However, many triples define values (eg, hasLatitiude) for entities.
>  I'd like to express these as node properties, but the Batch Importer
>  uses TSV syntax, which has a fixed set of properties per node.
Yep, good insight, you don't want to store those value triples as relationships.
> 
> Questions
> 
>  Q:  If I define properties in the TSV header, but leave the data
>      fields empty, what will the Batch Importer do?  For example:
> 
>        name       works_on    works_in
>        Michael    neo4j       Java
>        Richard                Ruby
>        Xavier             

Yes, it skips empty cells

> 
>      Would this create the following nodes?
> 
>        Michael:
>          works_on:  neo4j
>          works_in:  Java
>        Richard:
>          works_in:  Ruby
>        Xavier:
> 
>  Q:  If I have already used the Batch Importer to define nodes and
>      relationships, can I use it again to simply add properties?
> 
>        name       speaks
>        Michael    German
>        Richard    English

Unfortunately not it is really meant for insert. Theoretically it would be 
possible though but I'm not sure about the performance overhead.

> 
>      Given that the nodes file no longer has ID numbers, how do I
>      tell the Batch Importer which entities to modify?

If it would work You could state the properties to look-up from an index and 
then use those to find and update the nodes. But the index read performance is 
much slower than the batch-inserter write performance.

Usually what I'd do is to programmatically read all nodes of the graph and 
store the relevant lookup property (eg. url) and the node-id in a Map or sorted 
array. Then you can find the node quickly by id and update it.

HTH,

Michael

> 
> -- 
> http://www.cfcl.com/rdm           Rich Morin           [email protected]
> http://www.cfcl.com/rdm/resume    San Bruno, CA, USA   +1 650-873-7841
> 
> Software system design, development, and documentation
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to