New discussion topic on DataCleaner's online discussion forum (http://datacleaner.org/forum):
SteveH posted the subject 'Best practice for build complex pipelines?' ------------------- Hi, I have a couple of specific technical questions which I will post seperately - but my biggest question as I learn how to use the system is a more general one on what the "best practise" is regarding complex pipelines. Using the Customers.csv example data, lets say I want to make a series of transformations on Name, Address, Date of Birth. Each of these would be complex in their own right, and also interact with eachother. I am struggling to work this out at the moment. For example: Lets say I want to take in Country code, capitalise, do a Synonym lookup to convert the strings to the 3-char country codes and use Country-Standardiser to convert the two char codes to 3-char codes. I can do that bit. Now - I want to work on names. But - I want to apply different filters and transforms depending on whether the Country code is DRK or GBR. Perhaps I need say 10 transform/filters for GBR names and 15 for DRK names. This is where I get totally stuck - I just can't see a way of doing this at all. My first guess was that I needed to write two sets of temp tables, then run the name processes against those processes - however, I can't see a way of doing this - I can't connect the staging tables to other jobs. I had a look at bringing all the data back together are each logical path with a Union, but I'm not sure thats the way to go either. I guess this is a really long way of asking "Whats the best pattern to use when building very large, complex jobs which contain a lot of conditional logic based on previous steps?" Many thanks Steve ------------------- View the topic online to reply - go to http://datacleaner.org/topic/1098/Best-practice-for-build-complex-pipelines%3F -- You received this message because you are subscribed to the Google Groups "DataCleaner-notify" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/datacleaner-notify. For more options, visit https://groups.google.com/d/optout.
