Hi Karan, Griffin supports text-directory at current, you can use the "text-dir" type data connector: https://github.com/apache/incubator-griffin/blob/master/measure/src/main/scala/org/apache/griffin/measure/data/connector/batch/TextDirBatchDataConnector.scala
But for this data connector, it can only scan the files of the nth depth sub-directories, the files are read as text formatted, which has no schema either. You can have a try with it, but I'm not sure it could cover your case. The best way is to implement your own data connector, just refer to the one I listed above, you can also add the function to read schema file, it's not complicated. Thanks, Lionel On Mon, May 28, 2018 at 5:08 PM, Karan Gupta <[email protected]> wrote: > Hi Lionel, > > > > The entry point for my data flow are csv files on which I want to run > profiling jobs instead of hive tables. These csv files will be subjected to > profiling and health check before moving them into the data flow. Such > files will be on HDFS. Hence, I have couple of questions here > > > > Does profiling support files instead of hive tables? If yes, can I point > my “data.source” to an HDFS directory instead of specifying a file each > time, so that the griffin will run the profiling job on each newly added > file in that HDFS location. > > > > Thank you, > > Karan Gupta > > > > > > *From:* Lionel Liu <[email protected]> > *Sent:* Monday, May 28, 2018 1:58 PM > *To:* Karan Gupta <[email protected]>; [email protected]. > org > *Subject:* Re: Apache Griffin Profiling > > > > Hi Karan, > > > > Do you mean that you want to put your profiling config files in a HDFS > directory, and let griffin scan the directory to get the config files at > run time? > > > > Griffin measure module doesn't support this at current, you can refer to > the code entrance and implement your own param file reader if you want to > do that: > > https://github.com/apache/incubator-griffin/blob/master/ > measure/src/main/scala/org/apache/griffin/measure/Application.scala#L170 > <https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fmeasure%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fgriffin%2Fmeasure%2FApplication.scala%23L170&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ca6d29fd95ec94504c97808d5c474f525%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=sU9FmRMia3p9BNwNo9Ou1kcEy1wo1BYlUNO8ry5KM6o%3D&reserved=0> > > https://github.com/apache/incubator-griffin/tree/master/ > measure/src/main/scala/org/apache/griffin/measure/config/reader > <https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Ftree%2Fmaster%2Fmeasure%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fgriffin%2Fmeasure%2Fconfig%2Freader&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ca6d29fd95ec94504c97808d5c474f525%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=pme%2BSSV0lcw58WMHNrvwtl1vTSpBtD%2BajAnEzdzsoFk%3D&reserved=0> > > > > But in my opinion, maybe it's not appropriate to do such work in measure > module. This seems like to be some schedule work before submitting griffin > jobs. > > > > Thanks, > > Lionel > > > > > > On Mon, May 28, 2018 at 3:21 PM, Karan Gupta <[email protected]> > wrote: > > Hi Lionel, > > > > Thank you for your response, I created a single custom rule for multiple > sources. Now I am trying to run profiling jobs where my source is not > tightly coupled inside a rule. I want to run profiling jobs by just > pointing to a HDFS directory instead of a specific file <griffin should > pick up the file name from the directory on run time> > Is it possible to do that through Griffin? > > > > > > Thank you, > > Karan Gupta > ------------------------------ > > Any comments or statements made in this email are not necessarily those of > Tavant Technologies. The information transmitted is intended only for the > person or entity to which it is addressed and may contain confidential > and/or privileged material. If you have received this in error, please > contact the sender and delete the material from any computer. All emails > sent from or to Tavant Technologies may be subject to our monitoring > procedures. > > >
