Hi Karan, There's a mistake in my last email, the "where" condition should be split by "," not ";".
Thanks, Lionel On Wed, Apr 25, 2018 at 6:24 PM, Karan Gupta <[email protected]> wrote: > Thanks Lionel, > > That was helpful. > > Thank you, > Karan Gupta > > From: Lionel Liu <[email protected]> > Sent: Wednesday, April 25, 2018 3:49 PM > To: Karan Gupta <[email protected]> > Cc: [email protected] > Subject: Re: Partition Configuration > > Hi Karan, > > For most hive table in production environment, there're partition columns, > it's common to use time as the partition column. > > For example, we have a hive table with partition column "dt" and "hour", > with pattern like this: dt=20180406, hour=13. > > In griffin, we want to schedule a job to calculate accuracy measure on the > hive table hourly, but each time we only want to calculate the data of last > hour. > e.g.: At 14:05, the data from 13:00 - 13:59 is ready, we want to calculate > the data saved in partition hour=13. > For this usage, you can refer to https://cwiki.apache.org/ > confluence/display/GRIFFIN/5.+Griffin+Job+Scheduler+Design<h > ttps://apac01.safelinks.protection.outlook.com/?url= > https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay% > 2FGRIFFIN%2F5.%2BGriffin%2BJob%2BScheduler%2BDesign& > data=01%7C01%7Ckaran.gupta%40tavant.com%7C1231d30bd3604142a40908d5aa95 > f29c%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=Jt% > 2F08pVwcSedu2KG0Rbd7i%2BZ3gHm57Yc20bB8NP%2F66E%3D&reserved=0>, griffin > can schedule jobs with the data range by time. > > To achieve this goal, griffin need to know the pattern of partition column > value, so we need to configure it in "where" field like "dt=#YYYYMMdd# AND > hour=#HH#", griffin service will replace the time pattern between hash tag > with timestamp, and generate the concrete string like "dt=20180406 AND > hour=13" to trigger the spark job. > > If you submit griffin job directly to spark cluster, not using griffin > service, you need to configure "where" as "dt=20180406 AND hour=13" > directly, the measure module needs the concrete where clause. > > PS: > In this document: https://github.com/apache/incubator-griffin/blob/master/ > griffin-doc/measure/measure-configuration-guide.md#data-connector< > https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub. > com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fgriffin-doc%2Fmeasure% > 2Fmeasure-configuration-guide.md%23data-connector&data=01% > 7C01%7Ckaran.gupta%40tavant.com%7C1231d30bd3604142a40908d5aa95f29c% > 7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=EkxKdjwBOXSFr% > 2BfTQ7CwI8GXn7O%2BgwqWoLvPpAJVW8U%3D&reserved=0>, "partition" is out of > date, we configure it as "where" now, and the definition should be: > where: where conditions string, split by ";", optional. e.g. "dt=20170410 > AND hour=15; dt=20170411 AND hour=15; dt=20170412 AND hour=15" > It should be like a where clause, to filter data. We'll modify the > document later. > > Thanks, > Lionel > > On Wed, Apr 25, 2018 at 5:30 PM, Karan Gupta <[email protected]< > mailto:[email protected]>> wrote: > Hi Lionel, > > Could you help me understand the partition configuration under accuracy > measure better with an example. > > > Thank you, > Karan Gupta > ________________________________ > Any comments or statements made in this email are not necessarily those of > Tavant Technologies. The information transmitted is intended only for the > person or entity to which it is addressed and may contain confidential > and/or privileged material. If you have received this in error, please > contact the sender and delete the material from any computer. All emails > sent from or to Tavant Technologies may be subject to our monitoring > procedures. > >
