> GraphDatabaseService#getNodeById(long id) takes Neo4j internal ids.
Michael > Am 15.06.2015 um 20:59 schrieb Zongheng Yang <[email protected]>: > > Hi Mattias, > > Thanks for looking into this. I understand the difference between Neo4j > internal ids vs. the ids supplied in the csv. > > However for say GraphDatabaseService#getNodeById(long id), does this function > take the user-supplied ids or Neo4j's internal ids? > > If it is the former: then the conceptual mismatch doesn't fully explain the > problem (e.g. I queried the nodes/edges using user-supplied ids, and the > internal ids should not mess up with the query results). If it is the > latter, then for users programming using the Java Core API, how should they > get these correct internal ids (they only know application-supplied ids). > > Best, > Zongheng > > On Monday, June 15, 2015 at 5:23:24 AM UTC-7, Mattias Persson wrote: > Hello again, I'm quite confident I know what's happening here. The problem is > the misconception that your INTEGER ids defined in the csv files will map > 1-to-1 to the neo4j node/relationship ids in the store. They will actually > match in most cases, but that's merely a coincidence. > > What you're seeing is the result of some parallelism happening in the > importer where batches of 10k nodes/relationships flows through different > steps, where some steps may execute multiple batches in parallel and doesn't > care if reordering happens. Ids are assigned at the end. > > You're looking at the ids and see that they mismatch, but if you look at > their data you should see that all relationships match the csv files. So > please disregard the seemingly close match of neo4j node/relationship ids > with the csv input ids as they are quite different in nature. > > On Thursday, June 11, 2015 at 11:32:55 AM UTC+2, Mattias Persson wrote: > Hi, I'm one of the main authors of the import tool and I find this issue > quite interesting. > > Would you be able to share your dataset with me personally, for the single > purpose of trying to find the root cause? > > On Friday, June 5, 2015 at 5:12:43 AM UTC+2, Zongheng Yang wrote: > Hi all, > > I'm using neo4j-import to import nodes and relationships from csv files. > Let's say node id 538398 has about 100 edges and > > 538398 -> 370047 > 538398 -> 379981 > > are just two of them. After the import, the neo4j database actually > > - *loses* these two edges > - instead *corrupts* the destination ids, as follows > > 538398 -> 380047 > 538398 -> 389981 > > - *keeps* all other outgoing edges of 538398 correct > > The problem seems to be non-deterministic: doing a `rm -rf dbPath` and > re-running neo4j-import seems to fix the issue, for this particular node -- > but I've not done extensive tests to see whether other nodes get corrupted in > this way. > > Has anyone seen this before? The graph has on the order of 1 million node, > average degree 40. > > Zongheng > > -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] > <mailto:[email protected]>. > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>. -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
