Hi all, @Niclas: thanks a lot for efforts on providing an up-to-date SPARQL extension for Neo4j.
On Thursday, November 20, 2014 5:33:47 PM UTC+1, Michael Hunger wrote: > > That's what I meant with the misfit of modeling RDF data 1:1 into the > property graph instead of having a "sensible" mapping of only real entities > to nodes, real semantic tuples to relationships and everything else to > properties. > > I was also often thinking about this approach of mapping RDF to the property graph model. However, I think that this wouldn't really scale, because then you'll usually double the cost of your query (afaik), i.e., you need to have a look at the node properties and at its relations for a certain attribute, because you cannot always assume to receive literals or resources for a certain attribute (which you'll need to know before to just request one of both types). Furthermore, (afaik) node properties are not designed to store lists of values, i.e., it requires further processing steps to store multiple literal values in one node property (i.e. for one attribute). So I would tend to say that it's best to store everything as (node)-[edge]->[node] relationship, i.e., which perfectly aligns with the basic structure of an RDF statement (simple subject-predicate-object sentence). Then it doesn't matter, whether you are querying for statements with literal values or statements with resource values. Moreover, you have the opportunity to (better) deal with metadata (external context, ...) about statements, since you have the opportunity to also add properties at the edges (relationships), i.e., you can do, .e.g., statement-based versioning, clustering/partitioning (e.g. รก la Named Graphs) or introduce qualified attributes for ordering or simply add a unique identifier for the statement itself. So you can design a (graph) data model with more comprehensive capabilities then RDF ;) (because of the flexibility of the property graph model). Finally, you can create indices as necessary, e.g., for resource (nodes), for statements (relationships) or for literals to speed up the queries. Last but not least, we implemented this approach (prototypically (?)) as Neo4j unmanaged extension that can be found at https://github.com/dswarm/dswarm-graph-neo4j More details about the design of the graph data model can be found at https://github.com/dswarm/dswarm-documentation/wiki/Graph-Data-Model and https://github.com/dswarm/dswarm-documentation/wiki/Graph-Exploration We are happy about every kind of feedback and looking forward to interesting discussion about RDF-based graph data models mapped onto the property graph data model ;) Cheers, Bo PS: we also experimented with batch import, see https://github.com/dswarm/dswarm-graph-neo4j/tree/master/src/main/java/org/dswarm/graph/batch > It would be stellar to resolve that in a good way with a sensible default > mapping that might be augmented. > Wes and I discussed that when importing Freebase Data into Neo4j > > Michael > > On Thu, Nov 20, 2014 at 4:13 PM, Andrii Stesin <[email protected] > <javascript:>> wrote: > >> On Thursday, November 13, 2014 1:11:47 PM UTC+2, Niclas Hoyer wrote: >>> >>> Fuseki uses ~ 9 GB disk space after import, but Neo4j allocated 390 GB. >>> That also results in about 27 times slower query execution on this large >>> dataset. >>> >> >> I suspect some data modelling issue here... the difference is way bigger >> than one can expect. Factor of 10 won't make me wonder too much, but 40+ ?? >> why and how? >> >> Using the smallest dataset with just 2 MB Neo4j is just 2.4 times slower >>> than Fuseki. >>> >> >> This also makes me wonder, does Neo4j introduce so big an overhead >> compared to Fuseki? (small example should completely fit in memory, doesn't >> it?) >> >> WBR, >> Andrii >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Neo4j" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
