New reply on DataCleaner's online discussion forum 
(http://datacleaner.org/forum):

Kasper Sørensen replied to subject 'Best practice for build complex pipelines?'

-------------------

Hi Steve,

200 steps sounds like a terrible lot. I'm mostly worried if that'll ever be a 
pleasant job to work with. I would imagine many output columns from all those 
transformations and many filter outcomes from filters. Essentially polluting 
the space of available columns and filter requirements. But maybe 200 is a 
conscious exaggeration?

... And actually I guess there is a third alternatively also, which is to 
introduce a bit more scripting or extension-building to avoid that many steps.

I won't proclaim to know best practice TBH, but what I normally do is to look 
for "understandable processes" when designing jobs. For instance in a typical 
cleansing scenario I would have a "valid" (data is correct) scenario and a 
number of "rejection" scenarios. Rejected records are usually a lot more 
complex to process so those I would put into a table dedicated to that. And 
probably add a "rejection code" or something like that when inserting those 
rejected records. That would enable filtering on specific rejection reasons. 
That can then be the basis of new jobs which handle those rejections in various 
ways.

-------------------

View the topic online to reply - go to 
http://datacleaner.org/topic/1098/Best-practice-for-build-complex-pipelines%3F

-- 
You received this message because you are subscribed to the Google Groups 
"DataCleaner-notify" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/datacleaner-notify.
For more options, visit https://groups.google.com/d/optout.

Reply via email to