Hi,
I'll list what I'm planning to do and need your help in determining if
Google Appengine is a good option for this. If yes, a pointer on how
to go about building it would help.
I have millions of business records. New businesses are added
everyday. Every time a new Business is added, we need to determine if
the particular business already exists. We query our database and
search for businesses with matching keywords as entered by the user.
The query is on multiple columns and we return the best matches based
on the number of tokens that match.
Example:
Existing information :
Listing 1 :
Business Name : Spacely Space Sprockets
Address: Ring 325, Satellite 63, Outer Space, Galaxy X271
Listing 2 :
Business Name : Fred Flintstone Flasks
Address: #456, Bedrock, Stone Cave, Earth
Consider my database has the above mentioned records. Now, a user
comes to add a new listing and he enters :
Business Name: Space Ventura Quentin Tarantino
Address: God Father Street, Kill Stone, Outer Mafia, Folsom Prison
Now, my search would see that the new record has matches in the
existing listing 1 and listing 2.
In Listing 1, the 'Business Name' column matches one of the keyword
('space') in the newly entered business name. The 'Address' columns of
both Listing 1 as well as Listing 2 have one match each (listing 1 has
'outer' while listing 2 has 'stone') in the newly added listing.
Since Listing 1 has 2 matches in the newly entered data, Listing 1
would be displayed above Listing 2 as a duplicate suggestion.
This is what I want to do. Please remember the data would be in the
range of 10 to 15 million records to start with and hope to reach 50
million over a period of time. Your help would be greatly appreciated.
Sorry about the long post!
-Nischal
--
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/google-appengine?hl=en.