Hi Lionel, The entry point for my data flow are csv files on which I want to run profiling jobs instead of hive tables. These csv files will be subjected to profiling and health check before moving them into the data flow. Such files will be on HDFS. Hence, I have couple of questions here
Does profiling support files instead of hive tables? If yes, can I point my “data.source” to an HDFS directory instead of specifying a file each time, so that the griffin will run the profiling job on each newly added file in that HDFS location. Thank you, Karan Gupta From: Lionel Liu <[email protected]> Sent: Monday, May 28, 2018 1:58 PM To: Karan Gupta <[email protected]>; [email protected] Subject: Re: Apache Griffin Profiling Hi Karan, Do you mean that you want to put your profiling config files in a HDFS directory, and let griffin scan the directory to get the config files at run time? Griffin measure module doesn't support this at current, you can refer to the code entrance and implement your own param file reader if you want to do that: https://github.com/apache/incubator-griffin/blob/master/measure/src/main/scala/org/apache/griffin/measure/Application.scala#L170<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fmeasure%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fgriffin%2Fmeasure%2FApplication.scala%23L170&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ca6d29fd95ec94504c97808d5c474f525%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=sU9FmRMia3p9BNwNo9Ou1kcEy1wo1BYlUNO8ry5KM6o%3D&reserved=0> https://github.com/apache/incubator-griffin/tree/master/measure/src/main/scala/org/apache/griffin/measure/config/reader<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Ftree%2Fmaster%2Fmeasure%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fgriffin%2Fmeasure%2Fconfig%2Freader&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ca6d29fd95ec94504c97808d5c474f525%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=pme%2BSSV0lcw58WMHNrvwtl1vTSpBtD%2BajAnEzdzsoFk%3D&reserved=0> But in my opinion, maybe it's not appropriate to do such work in measure module. This seems like to be some schedule work before submitting griffin jobs. Thanks, Lionel On Mon, May 28, 2018 at 3:21 PM, Karan Gupta <[email protected]<mailto:[email protected]>> wrote: Hi Lionel, Thank you for your response, I created a single custom rule for multiple sources. Now I am trying to run profiling jobs where my source is not tightly coupled inside a rule. I want to run profiling jobs by just pointing to a HDFS directory instead of a specific file <griffin should pick up the file name from the directory on run time> Is it possible to do that through Griffin? Thank you, Karan Gupta ________________________________ Any comments or statements made in this email are not necessarily those of Tavant Technologies. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. If you have received this in error, please contact the sender and delete the material from any computer. All emails sent from or to Tavant Technologies may be subject to our monitoring procedures.
