How many duplicates do you expect? For taking care of duplicates you'd have to read from the index which is not advised for really high performance imports. And you'd also have to flush the index after each insert (or before each read) to make the entries just added available for reading which is also prohibitive.
That's why the batch-importer has no notion of uniqueness yet. Also for the label based indexes, they are only added at the end (shutdown) of the batch-inserter, so they won't be available. I recommend either doing a pre-pass of the csv files or keeping an in-memory structure during the import that's used for checking duplicates. You could also import them as is and remove the duplicates later, which might be the best option. Personally I'd go for the cleanup off the csv files and then the in-memory structure (e.g. a map). On Tue, Mar 25, 2014 at 7:39 PM, natalie Yosef <[email protected]> wrote: > in the second option , is it possible also when indexing is done by labels? > > and actually my problem is when i insert nodes say there ids are the > person social security number, > and then i get another excel of people that might contain duplicate in > the social security number, > and in that case i dont wan't him to create a new node with the same > social security number, > > in java api case i use: > findNodeByProperty("socialSecurityNumber",222222); > if not exit then > create node. > > is it possible somehow with a batchInsert code? > > -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
