How many duplicates do you expect?

For taking care of duplicates you'd have to read from the index which is
not advised for really high performance imports. And you'd also have to
flush the index after each insert (or before each read) to make the entries
just added available for reading which is also prohibitive.

That's why the batch-importer has no notion of uniqueness yet. Also for the
label based indexes, they are only added at the end (shutdown) of the
batch-inserter, so they won't be available.

I recommend either doing a pre-pass of the csv files or keeping an
in-memory structure during the import that's used for checking duplicates.

You could also import them as is and remove the duplicates later, which
might be the best option.

Personally I'd go for the cleanup off the csv files and then the in-memory
structure (e.g. a map).




On Tue, Mar 25, 2014 at 7:39 PM, natalie Yosef <[email protected]> wrote:

> in the second option , is it possible also when indexing is done by labels?
>
> and actually my problem is when i insert nodes say there ids are the
> person social security number,
>  and then i get another excel of people that might contain duplicate in
> the social security number,
> and in that case i dont wan't him to create a new node with the same
> social security number,
>
> in java api case i use:
> findNodeByProperty("socialSecurityNumber",222222);
> if not exit then
> create node.
>
> is it possible somehow with a batchInsert code?
>
> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to