Hi Ken,

as you know data cleansing and matching are often very domain specific, so the 
customers don't publish the algorithms.

There is some nice content from my colleague Brian Underwood on this topic.
http://blog.brian-underwood.codes/tag/master-data-management/ 
<http://blog.brian-underwood.codes/tag/master-data-management/>

One general approach is to project use-case-specific views of the data mashed 
up from a lot of different datasources into one graphdb/graph-model, run 
decision supporting queries and then either decide to throw away and recreate 
when needed again or subsequently update the data.

http://neo4j.com/use-cases/master-data-management/ 
<http://neo4j.com/use-cases/master-data-management/>

Cheers, Michael

> Am 17.09.2015 um 18:03 schrieb Ken Petro <[email protected]>:
> 
> Let me try to elaborate on this a little more now that I have gotten a better 
> understanding.  I am looking to implement the following basic matching 
> algorithm / hierchy:
> 
> Exact SSN, Exact DOB, Exact First Name, Exact Last Name
> Exact SSN, Exact DOB
> Exact SSN, Fuzzy DOB, Exact Last Name
> Exact SSN, Exact Last Name
> Fuzzy SSN, Exact DOB, Exact Last Name
> Exact DOB, Exact First Name, Exact Last Name
> 
> I am thinking of using cypher queries to establish relationships for each of 
> the 6 matching criterias defined above.  
> 
> Then after establishing those relationships I would assign a "score" to each 
> list of nodes that have a relationship, depending on which matching 
> relationship they had.  
> 
> From there I can determine a way to create a "master/parent" node based off 
> the children, haven't thought that through yet.
> 
> Does this make any sense?
> 
> Any help / comments are appreciated.
> 
> On Monday, September 14, 2015 at 3:28:29 PM UTC-4, Ken Petro wrote:
> I have been doing research for what options exist for creating a "Master 
> Database" for the customer domain which is different from traditional "MDM" 
> software and approaches.  I have come across many articles that reference 
> leveraging Graph Databases for this, and Neo4J comes up a lot.
> 
> One question I have is what tools are used for data cleansing and data 
> matching in this approach.  Let's say we plan to use Neo4J for this project, 
> how would we go about building our matching algorithm?  Is there capabilities 
> within Neo4J that make building that matching process out?
> 
> Thanks in advance.  
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to