New reply on DataCleaner's online discussion forum (http://datacleaner.org/forum):
Kasper Sørensen replied to subject 'Best practice for build complex pipelines?' ------------------- Hi Steve, 200 steps sounds like a terrible lot. I'm mostly worried if that'll ever be a pleasant job to work with. I would imagine many output columns from all those transformations and many filter outcomes from filters. Essentially polluting the space of available columns and filter requirements. But maybe 200 is a conscious exaggeration? ... And actually I guess there is a third alternatively also, which is to introduce a bit more scripting or extension-building to avoid that many steps. I won't proclaim to know best practice TBH, but what I normally do is to look for "understandable processes" when designing jobs. For instance in a typical cleansing scenario I would have a "valid" (data is correct) scenario and a number of "rejection" scenarios. Rejected records are usually a lot more complex to process so those I would put into a table dedicated to that. And probably add a "rejection code" or something like that when inserting those rejected records. That would enable filtering on specific rejection reasons. That can then be the basis of new jobs which handle those rejections in various ways. ------------------- View the topic online to reply - go to http://datacleaner.org/topic/1098/Best-practice-for-build-complex-pipelines%3F -- You received this message because you are subscribed to the Google Groups "DataCleaner-notify" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/datacleaner-notify. For more options, visit https://groups.google.com/d/optout.
