Hi John,

Thanks a lot for the detailed feedback.

GH issues is bugtracking on GitHub: https://github.com/neo4j/neo4j/issues

Please report the NaN issues (which might be not so easy to resolve).

And more importantly the need for disabling the transaction-log (which is
actually the write-ahead-log which takes care of recovery of already
committed transactions after machine / program crashes)

There was no out of disk issue I presume? You can configure the transaction
log to be either disabled or to only hold a certain amount or for a certain
number of days.

Michael


On Sun, Aug 14, 2016 at 5:07 AM, John Fry <[email protected]> wrote:

> Hi Michael, here is an update
>
>    - I found that I was writing NaNs into the array which caused the
>    exception in yellow. I fixed this with a simple trap:
>
> if (Float.isFinite(lcm))
>
>     sl.lcm[idx] = lcm;
>
> else
>
>     sl.lcm[idx] = -1.0f;
>
>    - When this was fixed I could write to all 200M relationships and the
>    db would open in the no4j-shell - BUT it would flag and exception when
>    exiting & closing. i.e. it wouldn't do a clean shut down.
>       - By turning off all transaction logging the db now opens and
>       closes without issues and all 200M 16 element float arrays are 
> successfully
>       written.
>
> So, to get this working I have to trap for NaNs and turn off transaction
> logging then my app will write all 200M property arrays and open and close
> cleanly in neo4j-shell.
> I haven't yet tried opening it from a java app - I will let you know if I
> have issues.
>
> What is a GH issues and how do I file one?
>
> Rgds, John
>
>
>
>
> On Saturday, August 13, 2016 at 11:32:44 AM UTC-7, Michael Hunger wrote:
>>
>> Hi John,
>>
>> thanks a lot for reporting back.
>>
>> Would you mind creating a GH issue (if possible reproducible with a
>> minimal test)?
>>
>> Do you get a clean shutdown (db.shutdown() for your program when creating
>> the data?
>>
>> I haven't seen an error with that kind of property on recovery.
>> Does the recovery error also happen when you open the db again from your
>> java program?
>>
>> Thanks so much,
>>
>> Michael
>>
>>
>>
>> On Sat, Aug 13, 2016 at 6:25 PM, John Fry <[email protected]> wrote:
>>
>>> Hi Michael, using arrays to store the properties solved the performance
>>> issues as you suggested. The application is completing 10x faster easily
>>>
>>>
>>> BUT it creates another problem. From the pseudo-code below I see the
>>> following behaviour:
>>>
>>>
>>>    - when the array lcm contains 16 test values (all -1.0f) the
>>>    application runs at performance and I can open the db (via neo4j-shell) 
>>> and
>>>    see the relationship have 16 x -1.0fs stored in a property array
>>>    - when lcm contains real and different values (e.g. 16 random
>>>    floats) the application runs at performance BUT the .db won't open in
>>>    neo4j-shell - it fails with the exception show below
>>>    - if i limit the size of the lcm array to 2 or 4 real/random floats
>>>    then it works
>>>
>>> I am guessing the property stores are compressed or something?
>>>
>>>
>>> Regards, John.
>>>
>>>
>>>
>>>
>>>         public class ScoredLink {
>>>
>>>     long id;
>>>
>>>       float[] lcm = new float[16];
>>>
>>>         ..........etc
>>>
>>>
>>>         public static void main(String[] args)
>>>
>>>         // ...do the math and score the 200M links local, in-memory
>>>
>>>         // open the neo4j db
>>>
>>>         // create batches of 500 relationships/links to write back
>>>
>>>         // push the batches into a thread pool
>>>
>>>         // for each thread....
>>>
>>>         try ( Transaction tx = db.beginTx() ) {
>>>
>>>             for (int i=start; i<=end; i++) { // i.e. start-end=500
>>>
>>>                 ScoredLink sl = scoredLinks.get(i);
>>>
>>>             Relationship l = db.getRelationshipById(sl.id);
>>>
>>>              l.setProperty("lwa_lcm", sl.lcm); //all 16 lcm vals
>>>
>>>          }
>>>
>>>      }
>>>
>>>      tx.success();
>>>
>>>         tx.close();
>>>                ..........etc
>>>
>>>
>>>
>>> ubuntu@ip-172-31-3-11:/opt/RAI/bin$ sudo neo4j-shell -v -path
>>> /opt/neo4j/data/graph.db/
>>> ERROR (-v for expanded information):
>>> Error starting org.neo4j.kernel.impl.factory.CommunityFacadeFactory,
>>> /opt/neo4j/data/graph.db
>>> java.lang.RuntimeException: Error starting 
>>> org.neo4j.kernel.impl.factory.CommunityFacadeFactory,
>>> /opt/neo4j/data/graph.db
>>> at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.new
>>> Facade(GraphDatabaseFacadeFactory.java:143)
>>> at org.neo4j.kernel.impl.factory.CommunityFacadeFactory.newFaca
>>> de(CommunityFacadeFactory.java:43)
>>> at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.new
>>> Facade(GraphDatabaseFacadeFactory.java:108)
>>> at org.neo4j.graphdb.factory.GraphDatabaseFactory.newDatabase(G
>>> raphDatabaseFactory.java:129)
>>> at org.neo4j.graphdb.factory.GraphDatabaseFactory$1.newDatabase
>>> (GraphDatabaseFactory.java:117)
>>> at org.neo4j.graphdb.factory.GraphDatabaseBuilder.newGraphDatab
>>> ase(GraphDatabaseBuilder.java:185)
>>> at org.neo4j.shell.kernel.GraphDatabaseShellServer.instantiateG
>>> raphDb(GraphDatabaseShellServer.java:203)
>>> at org.neo4j.shell.kernel.GraphDatabaseShellServer.<init>(Graph
>>> DatabaseShellServer.java:66)
>>> at org.neo4j.shell.StartClient.getGraphDatabaseShellServer(Star
>>> tClient.java:282)
>>> at org.neo4j.shell.StartClient.tryStartLocalServerAndClient(Sta
>>> rtClient.java:259)
>>> at org.neo4j.shell.StartClient.startLocal(StartClient.java:247)
>>> at org.neo4j.shell.StartClient.start(StartClient.java:180)
>>> at org.neo4j.shell.StartClient.main(StartClient.java:135)
>>> Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component
>>> 'org.neo4j.kernel.recovery.Recovery@10c38489' failed to initialize.
>>> Please see attached cause exception.
>>> at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.ini
>>> t(LifeSupport.java:434)
>>> at org.neo4j.kernel.lifecycle.LifeSupport.init(LifeSupport.java:66)
>>> at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:102)
>>> at org.neo4j.kernel.NeoStoreDataSource.start(NeoStoreDataSource
>>> .java:600)
>>> at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.sta
>>> rt(LifeSupport.java:452)
>>> at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:111)
>>> at org.neo4j.kernel.impl.transaction.state.DataSourceManager.
>>> start(DataSourceManager.java:112)
>>> at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.sta
>>> rt(LifeSupport.java:452)
>>> at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:111)
>>> at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.new
>>> Facade(GraphDatabaseFacadeFactory.java:139)
>>> ... 12 more
>>> Caused by: java.lang.IllegalArgumentException: Unknown entry type 7 for
>>> version 0. At position LogPosition{logVersion=170, byteOffset=16} and entry
>>> version V1_9
>>> at org.neo4j.kernel.impl.transaction.log.entry.LogEntryVersion.
>>> entryParser(LogEntryVersion.java:207)
>>> at org.neo4j.kernel.impl.transaction.log.entry.VersionAwareLogE
>>> ntryReader.readLogEntry(VersionAwareLogEntryReader.java:92)
>>> at org.neo4j.kernel.impl.transaction.log.LogEntryCursor.next(Lo
>>> gEntryCursor.java:54)
>>> at org.neo4j.kernel.recovery.LatestCheckPointFinder.find(Latest
>>> CheckPointFinder.java:77)
>>> at org.neo4j.kernel.recovery.PositionToRecoverFrom.apply(Positi
>>> onToRecoverFrom.java:53)
>>> at org.neo4j.kernel.recovery.DefaultRecoverySPI.getPositionToRe
>>> coverFrom(DefaultRecoverySPI.java:135)
>>> at org.neo4j.kernel.recovery.Recovery.init(Recovery.java:72)
>>> at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.ini
>>> t(LifeSupport.java:424)
>>> ... 21 more
>>>
>>>  -host      Domain name or IP of host to connect to (default: localhost)
>>>  -port      Port of host to connect to (default: 1337)
>>>  -name      RMI name, i.e. rmi://<host>:<port>/<name> (default: shell)
>>>  -pid       Process ID to connect to
>>>  -c         Command line to execute. After executing it the shell exits
>>>  -file      File containing commands to execute, or '-' to read from
>>> stdin. After executing it the shell exits
>>>  -readonly  Connect in readonly mode (only for connecting with -path)
>>>  -path      Points to a neo4j db path so that a local server can be
>>> started there
>>>  -config    Points to a config file when starting a local server
>>>
>>> Example arguments for remote:
>>> -port 1337
>>> -host 192.168.1.234 -port 1337 -name shell
>>> -host localhost -readonly
>>> ...or no arguments for default values
>>> Example arguments for local:
>>> -path /path/to/db
>>> -path /path/to/db -config /path/to/neo4j.config
>>> -path /path/to/db -readonly
>>>
>>>
>>>
>>> On Tuesday, August 9, 2016 at 1:32:21 PM UTC-7, Michael Hunger wrote:
>>>>
>>>> Oh sorry, I might have misunderstood you.
>>>>
>>>> Do you see the performance issue when creating the data or when
>>>> accessing it?
>>>>
>>>> Could you share your graph-creation code?
>>>>
>>>> M
>>>>
>>>> On Tue, Aug 9, 2016 at 3:51 PM, John Fry <[email protected]> wrote:
>>>>
>>>>> Hi Michael, thanks...
>>>>>
>>>>> some more background info on the queries:
>>>>> * note I am using neo ver 2.2 (I guess I should finally upgrade to 3+)
>>>>> * everything I do is via the java api
>>>>> * the queries are traversals and expansions:
>>>>> --- I walk the graph node to node selecting each node by a function of
>>>>> the weight vectors
>>>>> --- I expand around a node to a depth on n for both incoming and
>>>>> outgoing directions
>>>>> --- I commonly use shortest path using dijkstra with my own cost
>>>>> evaluators that use the weight vectors
>>>>> --- once I have a reliable way to write all the properties I will use
>>>>> the graph exclusively in 'read-only' mode. I only write the properties as
>>>>> part of a graph creation process which is a single event usage - fast and
>>>>> predictable creation of course is nice to achieve.
>>>>>
>>>>> I turn of transaction logging with: keep_logical_logs=false.
>>>>>
>>>>> Let me try using an integer array as a single property and see how
>>>>> that performs.
>>>>>
>>>>> Thanks, John.
>>>>>
>>>>>
>>>>> On Tuesday, August 9, 2016 at 3:35:50 AM UTC-7, Michael Hunger wrote:
>>>>>>
>>>>>> Hi John,
>>>>>>
>>>>>> which kind of "transaction logging did you turn off" ?
>>>>>>
>>>>>> Would you be able to share the queries you are using?
>>>>>>
>>>>>> each double property takes 8 bytes of storage in the property-record
>>>>>> (which are linked in a chain, each property-record can hold up to 4
>>>>>> 4-byte-storage properties).
>>>>>>
>>>>>> But arrays are optimized, esp. if you have small values in your
>>>>>> weights it tries to use only the significant bits to encode values in an
>>>>>> array (but I think it might only do that for integer values).
>>>>>>
>>>>>> Would you be able to run a test where instead of having 5-10
>>>>>> individual properties you just use an array with that many entries?
>>>>>>
>>>>>> And perhaps even better project the the floating point values to
>>>>>> integer values in that array.
>>>>>>
>>>>>> I also ask our kernel engineers for other tips in this regard.
>>>>>>
>>>>>> HTH,
>>>>>>
>>>>>> Michael
>>>>>>
>>>>>> On Mon, Aug 8, 2016 at 6:56 PM, John Fry <[email protected]> wrote:
>>>>>>
>>>>>>> Hello Michael,
>>>>>>>
>>>>>>> the graph is used as follows:
>>>>>>>
>>>>>>>    - ~10M nodes; ~200M relationships
>>>>>>>    - Each relationship requires multiple floating properties that
>>>>>>>    can be considered connecting strength weights. These multiple 
>>>>>>> weights make
>>>>>>>    up a weight vector - upto ~20 weights per vector
>>>>>>>    - The weights on the relationship are static (or at least they
>>>>>>>    rarely change)
>>>>>>>    - The weight vector is used to compute custom (very algorithmic
>>>>>>>    in nature) costs per link to drive node-to-node traversals, 
>>>>>>> expansions and
>>>>>>>    to find cost based n-shortest paths
>>>>>>>    - The costs per link are calculated in as close to real time as
>>>>>>>    possible and are always different and are never stored or written 
>>>>>>> back to
>>>>>>>    the relationships in the graph
>>>>>>>
>>>>>>> Regards, John.
>>>>>>>
>>>>>>> On Monday, August 8, 2016 at 12:12:13 AM UTC-7, Michael Hunger wrote:
>>>>>>>>
>>>>>>>> Hi John,
>>>>>>>>
>>>>>>>> Do you have more details on the properties that you add as well as
>>>>>>>> your graph model and queries? Without these details it will be hard to
>>>>>>>> help.
>>>>>>>>
>>>>>>>> It sounds a bit as if your property heavy relationships might be
>>>>>>>> nodes in hiding.
>>>>>>>>
>>>>>>>> Cheers Michael
>>>>>>>>
>>>>>>>>
>>>>>>>> Von meinem iPhone gesendet
>>>>>>>>
>>>>>>>> Am 08.08.2016 um 06:05 schrieb John Fry <[email protected]>:
>>>>>>>>
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> In ne04j 2.3 what / where are the limits when storing properties on
>>>>>>>> relationships?
>>>>>>>>
>>>>>>>> I have a graph with about 200M relationships and for each
>>>>>>>> relationship I want to add floating point attributes as properties.
>>>>>>>> Here is what I am experiencing:
>>>>>>>>
>>>>>>>>    - adding 2 properties per rel - all works fine; very good
>>>>>>>>    performance
>>>>>>>>    - adding 5 properties per rel - start to see exceptions/crashes
>>>>>>>>    - can be fixed by turning off transaction logging - good performance
>>>>>>>>    - adding ~7 properties per rel -  performance dramatically
>>>>>>>>    fades (10x slower) - occasional exceptions/crashes
>>>>>>>>    - adding ~10 properties per real - performance stalls/stops -
>>>>>>>>    eventually will crash
>>>>>>>>
>>>>>>>> What is a realistic set of expectations for storing this many
>>>>>>>> properties where the relationship store could easily exceed > 20GB?
>>>>>>>>
>>>>>>>> Regards and thanks for any advice, John.
>>>>>>>>
>>>>>>>> --
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "Neo4j" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>> send an email to [email protected].
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>
>>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "Neo4j" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to [email protected].
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Neo4j" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to