Hi Michael, here is an update
- I found that I was writing NaNs into the array which caused the
exception in yellow. I fixed this with a simple trap:
if (Float.isFinite(lcm))
sl.lcm[idx] = lcm;
else
sl.lcm[idx] = -1.0f;
- When this was fixed I could write to all 200M relationships and the db
would open in the no4j-shell - BUT it would flag and exception when exiting
& closing. i.e. it wouldn't do a clean shut down.
- By turning off all transaction logging the db now opens and closes
without issues and all 200M 16 element float arrays are successfully
written.
So, to get this working I have to trap for NaNs and turn off transaction
logging then my app will write all 200M property arrays and open and close
cleanly in neo4j-shell.
I haven't yet tried opening it from a java app - I will let you know if I
have issues.
What is a GH issues and how do I file one?
Rgds, John
On Saturday, August 13, 2016 at 11:32:44 AM UTC-7, Michael Hunger wrote:
>
> Hi John,
>
> thanks a lot for reporting back.
>
> Would you mind creating a GH issue (if possible reproducible with a
> minimal test)?
>
> Do you get a clean shutdown (db.shutdown() for your program when creating
> the data?
>
> I haven't seen an error with that kind of property on recovery.
> Does the recovery error also happen when you open the db again from your
> java program?
>
> Thanks so much,
>
> Michael
>
>
>
> On Sat, Aug 13, 2016 at 6:25 PM, John Fry <[email protected]
> <javascript:>> wrote:
>
>> Hi Michael, using arrays to store the properties solved the performance
>> issues as you suggested. The application is completing 10x faster easily
>>
>>
>> BUT it creates another problem. From the pseudo-code below I see the
>> following behaviour:
>>
>>
>> - when the array lcm contains 16 test values (all -1.0f) the
>> application runs at performance and I can open the db (via neo4j-shell)
>> and
>> see the relationship have 16 x -1.0fs stored in a property array
>> - when lcm contains real and different values (e.g. 16 random floats)
>> the application runs at performance BUT the .db won't open in neo4j-shell
>> -
>> it fails with the exception show below
>> - if i limit the size of the lcm array to 2 or 4 real/random floats
>> then it works
>>
>> I am guessing the property stores are compressed or something?
>>
>>
>> Regards, John.
>>
>>
>>
>>
>> public class ScoredLink {
>>
>> long id;
>>
>> float[] lcm = new float[16];
>>
>> ..........etc
>>
>>
>> public static void main(String[] args)
>>
>> // ...do the math and score the 200M links local, in-memory
>>
>> // open the neo4j db
>>
>> // create batches of 500 relationships/links to write back
>>
>> // push the batches into a thread pool
>>
>> // for each thread....
>>
>> try ( Transaction tx = db.beginTx() ) {
>>
>> for (int i=start; i<=end; i++) { // i.e. start-end=500
>>
>> ScoredLink sl = scoredLinks.get(i);
>>
>> Relationship l = db.getRelationshipById(sl.id);
>>
>> l.setProperty("lwa_lcm", sl.lcm); //all 16 lcm vals
>>
>> }
>>
>> }
>>
>> tx.success();
>>
>> tx.close();
>> ..........etc
>>
>>
>>
>> ubuntu@ip-172-31-3-11:/opt/RAI/bin$ sudo neo4j-shell -v -path
>> /opt/neo4j/data/graph.db/
>> ERROR (-v for expanded information):
>> Error starting org.neo4j.kernel.impl.factory.CommunityFacadeFactory,
>> /opt/neo4j/data/graph.db
>> java.lang.RuntimeException: Error starting
>> org.neo4j.kernel.impl.factory.CommunityFacadeFactory,
>> /opt/neo4j/data/graph.db
>> at
>> org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.newFacade(GraphDatabaseFacadeFactory.java:143)
>> at
>> org.neo4j.kernel.impl.factory.CommunityFacadeFactory.newFacade(CommunityFacadeFactory.java:43)
>> at
>> org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.newFacade(GraphDatabaseFacadeFactory.java:108)
>> at
>> org.neo4j.graphdb.factory.GraphDatabaseFactory.newDatabase(GraphDatabaseFactory.java:129)
>> at
>> org.neo4j.graphdb.factory.GraphDatabaseFactory$1.newDatabase(GraphDatabaseFactory.java:117)
>> at
>> org.neo4j.graphdb.factory.GraphDatabaseBuilder.newGraphDatabase(GraphDatabaseBuilder.java:185)
>> at
>> org.neo4j.shell.kernel.GraphDatabaseShellServer.instantiateGraphDb(GraphDatabaseShellServer.java:203)
>> at
>> org.neo4j.shell.kernel.GraphDatabaseShellServer.<init>(GraphDatabaseShellServer.java:66)
>> at
>> org.neo4j.shell.StartClient.getGraphDatabaseShellServer(StartClient.java:282)
>> at
>> org.neo4j.shell.StartClient.tryStartLocalServerAndClient(StartClient.java:259)
>> at org.neo4j.shell.StartClient.startLocal(StartClient.java:247)
>> at org.neo4j.shell.StartClient.start(StartClient.java:180)
>> at org.neo4j.shell.StartClient.main(StartClient.java:135)
>> Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component
>> 'org.neo4j.kernel.recovery.Recovery@10c38489' failed to initialize. Please
>> see attached cause exception.
>> at
>> org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.init(LifeSupport.java:434)
>> at org.neo4j.kernel.lifecycle.LifeSupport.init(LifeSupport.java:66)
>> at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:102)
>> at org.neo4j.kernel.NeoStoreDataSource.start(NeoStoreDataSource.java:600)
>> at
>> org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:452)
>> at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:111)
>> at
>> org.neo4j.kernel.impl.transaction.state.DataSourceManager.start(DataSourceManager.java:112)
>> at
>> org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:452)
>> at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:111)
>> at
>> org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.newFacade(GraphDatabaseFacadeFactory.java:139)
>> ... 12 more
>> Caused by: java.lang.IllegalArgumentException: Unknown entry type 7 for
>> version 0. At position LogPosition{logVersion=170, byteOffset=16} and entry
>> version V1_9
>> at
>> org.neo4j.kernel.impl.transaction.log.entry.LogEntryVersion.entryParser(LogEntryVersion.java:207)
>> at
>> org.neo4j.kernel.impl.transaction.log.entry.VersionAwareLogEntryReader.readLogEntry(VersionAwareLogEntryReader.java:92)
>> at
>> org.neo4j.kernel.impl.transaction.log.LogEntryCursor.next(LogEntryCursor.java:54)
>> at
>> org.neo4j.kernel.recovery.LatestCheckPointFinder.find(LatestCheckPointFinder.java:77)
>> at
>> org.neo4j.kernel.recovery.PositionToRecoverFrom.apply(PositionToRecoverFrom.java:53)
>> at
>> org.neo4j.kernel.recovery.DefaultRecoverySPI.getPositionToRecoverFrom(DefaultRecoverySPI.java:135)
>> at org.neo4j.kernel.recovery.Recovery.init(Recovery.java:72)
>> at
>> org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.init(LifeSupport.java:424)
>> ... 21 more
>>
>> -host Domain name or IP of host to connect to (default: localhost)
>> -port Port of host to connect to (default: 1337)
>> -name RMI name, i.e. rmi://<host>:<port>/<name> (default: shell)
>> -pid Process ID to connect to
>> -c Command line to execute. After executing it the shell exits
>> -file File containing commands to execute, or '-' to read from
>> stdin. After executing it the shell exits
>> -readonly Connect in readonly mode (only for connecting with -path)
>> -path Points to a neo4j db path so that a local server can be
>> started there
>> -config Points to a config file when starting a local server
>>
>> Example arguments for remote:
>> -port 1337
>> -host 192.168.1.234 -port 1337 -name shell
>> -host localhost -readonly
>> ...or no arguments for default values
>> Example arguments for local:
>> -path /path/to/db
>> -path /path/to/db -config /path/to/neo4j.config
>> -path /path/to/db -readonly
>>
>>
>>
>> On Tuesday, August 9, 2016 at 1:32:21 PM UTC-7, Michael Hunger wrote:
>>>
>>> Oh sorry, I might have misunderstood you.
>>>
>>> Do you see the performance issue when creating the data or when
>>> accessing it?
>>>
>>> Could you share your graph-creation code?
>>>
>>> M
>>>
>>> On Tue, Aug 9, 2016 at 3:51 PM, John Fry <[email protected]> wrote:
>>>
>>>> Hi Michael, thanks...
>>>>
>>>> some more background info on the queries:
>>>> * note I am using neo ver 2.2 (I guess I should finally upgrade to 3+)
>>>> * everything I do is via the java api
>>>> * the queries are traversals and expansions:
>>>> --- I walk the graph node to node selecting each node by a function of
>>>> the weight vectors
>>>> --- I expand around a node to a depth on n for both incoming and
>>>> outgoing directions
>>>> --- I commonly use shortest path using dijkstra with my own cost
>>>> evaluators that use the weight vectors
>>>> --- once I have a reliable way to write all the properties I will use
>>>> the graph exclusively in 'read-only' mode. I only write the properties as
>>>> part of a graph creation process which is a single event usage - fast and
>>>> predictable creation of course is nice to achieve.
>>>>
>>>> I turn of transaction logging with: keep_logical_logs=false.
>>>>
>>>> Let me try using an integer array as a single property and see how that
>>>> performs.
>>>>
>>>> Thanks, John.
>>>>
>>>>
>>>> On Tuesday, August 9, 2016 at 3:35:50 AM UTC-7, Michael Hunger wrote:
>>>>>
>>>>> Hi John,
>>>>>
>>>>> which kind of "transaction logging did you turn off" ?
>>>>>
>>>>> Would you be able to share the queries you are using?
>>>>>
>>>>> each double property takes 8 bytes of storage in the property-record
>>>>> (which are linked in a chain, each property-record can hold up to 4
>>>>> 4-byte-storage properties).
>>>>>
>>>>> But arrays are optimized, esp. if you have small values in your
>>>>> weights it tries to use only the significant bits to encode values in an
>>>>> array (but I think it might only do that for integer values).
>>>>>
>>>>> Would you be able to run a test where instead of having 5-10
>>>>> individual properties you just use an array with that many entries?
>>>>>
>>>>> And perhaps even better project the the floating point values to
>>>>> integer values in that array.
>>>>>
>>>>> I also ask our kernel engineers for other tips in this regard.
>>>>>
>>>>> HTH,
>>>>>
>>>>> Michael
>>>>>
>>>>> On Mon, Aug 8, 2016 at 6:56 PM, John Fry <[email protected]> wrote:
>>>>>
>>>>>> Hello Michael,
>>>>>>
>>>>>> the graph is used as follows:
>>>>>>
>>>>>> - ~10M nodes; ~200M relationships
>>>>>> - Each relationship requires multiple floating properties that
>>>>>> can be considered connecting strength weights. These multiple weights
>>>>>> make
>>>>>> up a weight vector - upto ~20 weights per vector
>>>>>> - The weights on the relationship are static (or at least they
>>>>>> rarely change)
>>>>>> - The weight vector is used to compute custom (very algorithmic
>>>>>> in nature) costs per link to drive node-to-node traversals,
>>>>>> expansions and
>>>>>> to find cost based n-shortest paths
>>>>>> - The costs per link are calculated in as close to real time as
>>>>>> possible and are always different and are never stored or written
>>>>>> back to
>>>>>> the relationships in the graph
>>>>>>
>>>>>> Regards, John.
>>>>>>
>>>>>> On Monday, August 8, 2016 at 12:12:13 AM UTC-7, Michael Hunger wrote:
>>>>>>>
>>>>>>> Hi John,
>>>>>>>
>>>>>>> Do you have more details on the properties that you add as well as
>>>>>>> your graph model and queries? Without these details it will be hard to
>>>>>>> help.
>>>>>>>
>>>>>>> It sounds a bit as if your property heavy relationships might be
>>>>>>> nodes in hiding.
>>>>>>>
>>>>>>> Cheers Michael
>>>>>>>
>>>>>>>
>>>>>>> Von meinem iPhone gesendet
>>>>>>>
>>>>>>> Am 08.08.2016 um 06:05 schrieb John Fry <[email protected]>:
>>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> In ne04j 2.3 what / where are the limits when storing properties on
>>>>>>> relationships?
>>>>>>>
>>>>>>> I have a graph with about 200M relationships and for each
>>>>>>> relationship I want to add floating point attributes as properties.
>>>>>>> Here is what I am experiencing:
>>>>>>>
>>>>>>> - adding 2 properties per rel - all works fine; very good
>>>>>>> performance
>>>>>>> - adding 5 properties per rel - start to see exceptions/crashes
>>>>>>> - can be fixed by turning off transaction logging - good performance
>>>>>>> - adding ~7 properties per rel - performance dramatically fades
>>>>>>> (10x slower) - occasional exceptions/crashes
>>>>>>> - adding ~10 properties per real - performance stalls/stops -
>>>>>>> eventually will crash
>>>>>>>
>>>>>>> What is a realistic set of expectations for storing this many
>>>>>>> properties where the relationship store could easily exceed > 20GB?
>>>>>>>
>>>>>>> Regards and thanks for any advice, John.
>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "Neo4j" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to [email protected].
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "Neo4j" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Neo4j" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> --
>> You received this message because you are subscribed to the Google Groups
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
--
You received this message because you are subscribed to the Google Groups
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.