Re: [Neo4j] LUBM benchmark: import and Cypher queries

Manish Rai Jain Tue, 08 Dec 2015 15:57:07 -0800

Hey Martin,

What was the outcome of this? Did you manage to run the benchmarks on Neo4J?


Cheers,
Manish


On Friday, June 28, 2013 at 10:45:40 PM UTC+10, Michael Hunger wrote:
>
> Hi Martin,
>
> that sound great, please keep us in the loop about your progress, happy to 
> support you whenever.
>
> Michael
>
>
> On Thu, Jun 27, 2013 at 7:04 PM, Martin Bravenboer <
> [email protected] <javascript:>> wrote:
>
>> Hi Michael,
>>
>> Thanks a lot for your help!
>>
>> The idea for a 'type' index worked very well. I was able to tune the
>> slowest queries (which included q2) to be about 10x faster. The nested
>> MATCH did not help in this particular case, but did help for other
>> queries.
>>
>> I'm working on moving the benchmark into a public repository, and do
>> also have some follow-up questions to understand the execution plans.
>> I briefly wanted to get back to you already to thank you for your help
>> while I'm working on that.
>>
>> Cheers,
>> Martin
>>
>>
>> On Wed, Jun 26, 2013 at 12:16 AM, Michael Hunger
>> <[email protected] <javascript:>> wrote:
>> > Thanks so much Martin for working on that,
>> >
>> > I have no knowledge of LUBM so please excuse my ignorance.
>> >
>> > Can you say something about the datamodel and cardinalities in general?
>> >
>> > It would be great if you could share the imported zipped database for
>> > instance on dropbox, so we could take a stab at optimizing the queries.
>> >
>> >
>> > Query2 is an OLAP query something neo4j is not optimized for per-se.
>> >
>> > Especially for query 2 it might be interesting to index nodes by type 
>> too.
>> >
>> > So it would become:
>> >
>> > START
>> >   student = node:types(type = 'GraduateStudent')
>> > MATCH
>> >   student-[:memberOf]->dept,
>> >   dept-[:subOrganizationOf]->univ,
>> >   student-[:undergraduateDegreeFrom]->univ
>> > RETURN
>> >   student, dept, univ
>> >
>> >
>> > I would probably rewrite the query to start at the universities which 
>> are
>> > fewer starting nodes:
>> >
>> > START
>> >   univ = node:types(type = 'University')
>> > MATCH
>> >   student-[:memberOf]->dept,
>> >   dept-[:subOrganizationOf]->univ,
>> >   student-[:undergraduateDegreeFrom]->univ
>> > RETURN
>> >   student, dept, univ
>> >
>> > something else one might try is to break down the match into individual
>> > matches and handle reduce the amount of data processed in flight.
>> >
>> > Something like this:
>> >
>> > START
>> >   univ = node:types(type = 'University')
>> > MATCH
>> >   dept-[:subOrganizationOf]->univ,
>> > WITH univ, collect(dept) as depts
>> >
>> > MATCH
>> >   student-[:undergraduateDegreeFrom]->univ
>> > WHERE ANY(dept in depts : student-[:memberOf]->dept)
>> > RETURN
>> >   student, univ
>> >
>> > Regading your import, code it looks good, very clean.
>> > As stated I would try to use an index to index per type.
>> > You should batch transactions in in groups of 20-30k elements.
>> >
>> >
>> >
>> > On Tue, Jun 25, 2013 at 5:51 AM, Martin Bravenboer
>> > <[email protected] <javascript:>> wrote:
>> >>
>> >> Hi all,
>> >>
>> >> To better understand the capabilities of graph databases, I'm working
>> >> on porting the LUBM benchmark (
>> >> http://swat.cse.lehigh.edu/projects/lubm/ ) to Neo4J. Because I'm not
>> >> yet very familiar with Neo4J, I'm looking for some general advice on
>> >> whether the approach I'm following seems wise (which is why I used the
>> >> mailing list instead of StackOverflow).
>> >>
>> >> Initially, I tried to import the generated RDF data using
>> >> Tinkerpop/BluePrints and run SPARQL queries using OpenRDF. This didn't
>> >> work out that well: the import of the large volume of RDF data
>> >> performed very poorly, to the extent that I really could not populate
>> >> a database with a reasonable scale. For this reason, I switched to
>> >> importing CSV files that we generate from the RDF data. The attached
>> >> program (Main.java) is a preliminary version of this import tool. I
>> >> also felt that the SPARQL approach would limit the tuning we can do on
>> >> the queries, and the RDF graph is so specific to RDF that it seems
>> >> hard to query it using Java or Cypher. The CSV data is imported in a
>> >> way that more closely resembles what seems to be a typical Neo4J
>> >> schema, for example basic properties like 'email' become properties of
>> >> the node, rather than separate nodes and edges. Because the LUBM
>> >> benchmark depends on some basic OWL inference capabilities, I'm also
>> >> adding some ad-hoc code to 'fix' the graph to manually do this
>> >> inference. You can see an example of this in the attached Main.java,
>> >> which is creating the proper edges for super-classes. This was also
>> >> needed in the original RDF version. This tool performs pretty nicely
>> >> now.
>> >>
>> >> Some questions I have here in this first cut of the benchmark:
>> >>
>> >> 1) I found this:
>> >> https://svn.neo4j.org/laboratory/users/johan/lubm/trunk/ , but the
>> >> implementation seemed very out of date, both in the import code, as
>> >> well as the Java-based queries. The implementation of some queries
>> >> also didn't seem very efficient, which you can see from the
>> >> spreadsheet in that repository. Are there any other LUBM
>> >> implementations about that perhaps I did not find?
>> >>
>> >> 2) Have other people also observed that importing RDF via the
>> >> BluePrints API performs significantly worse than importing a more
>> >> barebone graph using the Neo4J API directly? Is it a well-known thing
>> >> that querying via SPARQL/RDF is not the best demonstration of Neo4J's
>> >> abilities?
>> >>
>> >> 3) Do you see any bad/poorly performing patterns in the import code?
>> >> We're trying to first generate all nodes, and then separately create
>> >> the edges, to avoid having to do this for all edge data. Is that a
>> >> good pattern to follow?
>> >>
>> >> 4) I've included two Cyper queries (q1 and q2 from LUBM) that seem
>> >> like fairly faithful translations of the original query. In q2, I had
>> >> some difficultly deciding what a good start node is. There really
>> >> isn't one, because it relates a whole bunch of nodes. This query
>> >> currently does not perform that well, do you have any suggestions to
>> >> tune it? In general I've had difficultly getting matches that relate
>> >> several nodes to perform well. Perhaps there is a better way to write
>> >> these?
>> >>
>> >> q1
>> >> ----------------
>> >> START course = node:ids(id =
>> >> 'http://www.Department0.University0.edu/GraduateCourse0')
>> >> MATCH course<-[:takesCourse]-x-[:type]->t
>> >> WHERE t.id = 'GraduateStudent'
>> >> RETURN x
>> >> ----------------
>> >>
>> >> q2
>> >> ----------------
>> >> START
>> >>   grad = node:ids(id = 'GraduateStudent')
>> >>   //univ = node:ids(id = 'University'),
>> >>   //dept = node:ids(id = 'Department')
>> >> MATCH
>> >>   x-[:type]->grad,
>> >>   //y-[:type]->univ,
>> >>   //z-[:type]->dept,
>> >>   x-[:memberOf]->z,
>> >>   z-[:subOrganizationOf]->y,
>> >>   x-[:undergraduateDegreeFrom]->y
>> >> RETURN
>> >>   x, y, z
>> >> ----------------
>> >> For reference, the Java implementation of q2 in the link above was:
>> >>
>> >> 
>> https://svn.neo4j.org/laboratory/users/johan/lubm/trunk/src/main/java/org/neo4j/lubm/barebone/Query2.java
>> >>
>> >> I very much appreciate any advice the group can share!
>> >>
>> >> Once we have some more queries running, we'll be very happy to share
>> >> the implementation of the LUBM suite so that people can review the
>> >> implementation, or perhaps even use it in the future.
>> >>
>> >> Thanks,
>> >> Martin
>> >>
>> >> --
>> >> You received this message because you are subscribed to the Google 
>> Groups
>> >> "Neo4j" group.
>> >> To unsubscribe from this group and stop receiving emails from it, send 
>> an
>> >> email to [email protected] <javascript:>.
>> >> For more options, visit https://groups.google.com/groups/opt_out.
>> >>
>> >>
>> >
>> > --
>> > You received this message because you are subscribed to the Google 
>> Groups
>> > "Neo4j" group.
>> > To unsubscribe from this group and stop receiving emails from it, send 
>> an
>> > email to [email protected] <javascript:>.
>> > For more options, visit https://groups.google.com/groups/opt_out.
>> >
>> >
>>
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] LUBM benchmark: import and Cypher queries

Reply via email to