Re: [Neo4j] Re: neo4j-import non-deterministically corrupts a few node ids

Michael Hunger Mon, 15 Jun 2015 12:08:16 -0700

> GraphDatabaseService#getNodeById(long id)

takes Neo4j internal ids.


Michael

> Am 15.06.2015 um 20:59 schrieb Zongheng Yang <[email protected]>:
> 
> Hi Mattias,
> 
> Thanks for looking into this.  I understand the difference between Neo4j 
> internal ids vs. the ids supplied in the csv. 
> 
> However for say GraphDatabaseService#getNodeById(long id), does this function 
> take the user-supplied ids or Neo4j's internal ids?
> 
> If it is the former: then the conceptual mismatch doesn't fully explain the 
> problem (e.g. I queried the nodes/edges using user-supplied ids, and the 
> internal ids should not mess up with the query results).  If it is the 
> latter, then for users programming using the Java Core API, how should they 
> get these correct internal ids (they only know application-supplied ids).
> 
> Best,
> Zongheng
> 
> On Monday, June 15, 2015 at 5:23:24 AM UTC-7, Mattias Persson wrote:
> Hello again, I'm quite confident I know what's happening here. The problem is 
> the misconception that your INTEGER ids defined in the csv files will map 
> 1-to-1 to the neo4j node/relationship ids in the store. They will actually 
> match in most cases, but that's merely a coincidence.
> 
> What you're seeing is the result of some parallelism happening in the 
> importer where batches of 10k nodes/relationships flows through different 
> steps, where some steps may execute multiple batches in parallel and doesn't 
> care if reordering happens. Ids are assigned at the end.
> 
> You're looking at the ids and see that they mismatch, but if you look at 
> their data you should see that all relationships match the csv files. So 
> please disregard the seemingly close match of neo4j node/relationship ids 
> with the csv input ids as they are quite different in nature.
> 
> On Thursday, June 11, 2015 at 11:32:55 AM UTC+2, Mattias Persson wrote:
> Hi, I'm one of the main authors of the import tool and I find this issue 
> quite interesting.
> 
> Would you be able to share your dataset with me personally, for the single 
> purpose of trying to find the root cause?
> 
> On Friday, June 5, 2015 at 5:12:43 AM UTC+2, Zongheng Yang wrote:
> Hi all,
> 
> I'm using neo4j-import to import nodes and relationships from csv files. 
> Let's say node id 538398 has about 100 edges and
> 
> 538398 -> 370047
> 538398 -> 379981
> 
> are just two of them.  After the import, the neo4j database actually 
> 
> - *loses* these two edges
> - instead *corrupts* the destination ids, as follows
> 
>     538398 -> 380047
>     538398 -> 389981
> 
> - *keeps* all other outgoing edges of 538398 correct
> 
> The problem seems to be non-deterministic: doing a `rm -rf dbPath` and 
> re-running neo4j-import seems to fix the issue, for this particular node -- 
> but I've not done extensive tests to see whether other nodes get corrupted in 
> this way.
> 
> Has anyone seen this before? The graph has on the order of 1 million node, 
> average degree 40. 
> 
> Zongheng
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Re: neo4j-import non-deterministically corrupts a few node ids

Reply via email to