Scott Klarenbach <[EMAIL PROTECTED]> wrote on 05/27/2005 06:57:19 
PM:

> -- How is 'xFxIxExLxDx1x' easier to search than 'FIELD1'?
> 
> It's not easier.  It is; however, accurate for the purpose at hand. 
> FIELD1 isn't completely inclusive...it would miss FIELD-1...
> 
> -- However, It seems to me that this kind of data manipulation
> (cleanup) needs to happen BEFORE the data enters the databse.
> 
> Sometimes FIELD-1 is the ACTUAL data, with no erroneus
> characters...and sometimes it is FIELD1 with an erroneous (or
> unwanted) - character before the 1)...so cleaning the data would
> actually be corrupting some of it.
> 
> I need to keep the data in it's original form, but also allow for
> querying without worrying about the special characters inside the
> column.
> 
> FYI, these are part numbers off of electronic components, many of them
> coming from China...so, a Cisco part may have an MPN of RX321, or
> RX321-TR...either is valid.  Now, the corresponding Chinese part
> number for the first one, may come back as RX32-1...which is out of my
> control.
> 
> Cleaning the data would be the wrong approach, because it would
> actually invalidate the second Cisco part number, which MEANT to
> include the special characters.
> 
> For this reason, the user wants to be able to search for 'RX321' and
> 'RX321TR' respectively, and not worry about whether the data is
> erroneous or valid; just to basically ignore all the characters and
> let a human decide what they want.
> 
> If you have a more elegant solution, I'm all ears :-).
> 
> On 5/27/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> > 
> > 
> > Scott Klarenbach <[EMAIL PROTECTED]> wrote on 05/27/2005 
04:38:40
> > PM:
> > 
> > 
> > > Ya, it is a little too specific...here's why I need it.
> > > 
> > > I have a client that wants to search for part numbers in his DB. The
> > > problem is, they come into his DB from external sources, with all
> > > sorts of special characters in them...
> > > 
> > > So, he has fields like 
> > > field_one!,
> > > fi--eld   2,
> > > @fi#eld__3xxx
> > > 
> > >  etc
> > > 
> > > but, he wants to do a search for 'fieldone' and return the first 
one,
> > > 'field2' returns the second, etc...basically disregard all 
non-alphas
> > > padding every character in the search string.
> > > 
> > > On 5/27/05, Eric Bergen <[EMAIL PROTECTED]> wrote:
> > > > I'm working on a set of UDFs for preg functions.
> > > > 
> > > > 
> > > > [EMAIL PROTECTED] wrote:
> > > > 
> > > > >I have a hard time figuring out when you would use such a 
function. I
> > do
> > > > >not believe you will be able to duplicate this behavior without
> > > > >constructing your own UDF or by writing a stored procedure. BTW, 
why
> > *do*
> > > > >you want this function?
> > > > >
> > > > >Shawn Green
> > > > >Database Administrator
> > > > >Unimin Corporation - Spruce Pine
> > > > >
> > > > >Scott Klarenbach <[EMAIL PROTECTED]> wrote on 05/27/2005
> > 01:30:35
> > > > >PM:
> > > > >
> > > > >
> > > > >
> > > > >>I'm trying to replicate this PHP behavior in a MySQL stored 
procedure.
> > > > >> The purpose is to pad every character of the string with a pad
> > > > >>character.  For example, if the pad character is 'x' and 
thestring is
> > > > >>'STRING', the result is 'xSxTxRxIxNxGx'.
> > > > >>
> > > > >>Here is the PHP code if it helps.  I'd like to use a regular
> > > > >>expression to replace, but I guess I could loop through the 
string
> > > > >>char by char and build a new one, it's just less elegant. Thanks 
in
> > > > >>advance.
> > > > >>
> > > > >>PHP:
> > > > >>-------------
> > > > >>$regPattern =  implode('x', preg_split('//', "STRING", -1,
> > > > >>PREG_SPLIT_NO_EMPTY));
> > > > >>
> > 
> > Thank you very much. I find this whole "padding" process very
> > counterintuitive. I have a few minor questions, if you don't mind. How 
is
> > 'xFxIxExLxDx1x' easier to search than 'FIELD1'?  Would you, could you 
please
> > explain the theory behind why and when this kind of padding should be 
done?
> > What problem does it solve and how is it a solution to that problem? 
This
> > is completely baffling to me and I thought I had seen a lot of weird 
data
> > before :-) 
> > 
> > However, It seems to me that this kind of data manipulation (cleanup) 
needs
> > to happen BEFORE the data enters the databse. What data import 
tool/process
> > is your client using? Can you not change the import process to scrub 
the
> > data and does it not have a better facility to interleave padding into 
a
> > string than a MySQL stored procedure or UDF? 
> > 
> > Thanks for you patience! 
> > 
> > 
> > Shawn Green
> > Database Administrator
> > Unimin Corporation - Spruce Pine 
> > 
> >

I would suggest the following technique. Keep your original data (trimmed 
of leading and trailing whitespace characters) in one field and a 
"cleansed" version of the same information in a second field. By cleansing 
you should eliminate all non-alphanumerics from your search string. 

I have some direct insight to searching similar but different product 
codes (same product/different codes & different products/same codes). My 
wife works for a company that consolidates medical inventories for major 
hospital and health care organizations and there is usually more than one 
vendor to the same "product". This leads to the exact same inventory code 
confusion you describe. In the majority of their data, the vendor will 
either prepend or append a manufacturer's id code or their own id code to 
the "normal" part number. Part of my wife's job is to filter through the 
confusion and match identical products to each other (based on vendor's 
ID, manufacturer's id, product codes, and descriptions) so that her 
clients no longer need to purchase the exact same thing from multiple 
sources (at a higher cost per unit) when they could be buying in bulk from 
a single source saving millions of dollars. 

That's part of how they do it. I figure it may work for you too.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

Reply via email to