New discussion topic on DataCleaner's online discussion forum 
(http://datacleaner.org/forum):

SteveH posted the subject 'Best practice for build complex pipelines?'

-------------------

Hi,

I have a couple of specific technical questions which I will post seperately - 
but my biggest question as I learn how to use the system is a more general one 
on what the "best practise" is regarding complex pipelines.

Using the Customers.csv example data, lets say I want to make a series of 
transformations on Name, Address, Date of Birth. Each of these would be complex 
in their own right, and also interact with eachother. I am struggling to work 
this out at the moment.

For example: Lets say I want to take in Country code, capitalise, do a Synonym 
lookup to convert the strings to the 3-char country codes and use 
Country-Standardiser to convert the two char codes to 3-char codes.

I can do that bit.

Now - I want to work on names. But - I want to apply different filters and 
transforms depending on whether the Country code is DRK or GBR.  Perhaps I need 
say 10 transform/filters for GBR names and 15 for DRK names.

This is where I get totally stuck - I just can't see a way of doing this at 
all. My first guess was that I needed to write two sets of temp tables, then 
run the name processes against those processes - however, I can't see a way of 
doing this - I can't connect the staging tables to other jobs. I had a look at 
bringing all the data back together are each logical path with a Union, but I'm 
not sure thats the way to go either.

I guess this is a really long way of asking "Whats the best pattern to use when 
building very large, complex jobs which contain a lot of conditional logic 
based on previous steps?"

Many thanks

Steve

-------------------

View the topic online to reply - go to 
http://datacleaner.org/topic/1098/Best-practice-for-build-complex-pipelines%3F

-- 
You received this message because you are subscribed to the Google Groups 
"DataCleaner-notify" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/datacleaner-notify.
For more options, visit https://groups.google.com/d/optout.

Reply via email to