Thanks Lionel, That was helpful.
Thank you, Karan Gupta From: Lionel Liu <[email protected]> Sent: Wednesday, April 25, 2018 3:49 PM To: Karan Gupta <[email protected]> Cc: [email protected] Subject: Re: Partition Configuration Hi Karan, For most hive table in production environment, there're partition columns, it's common to use time as the partition column. For example, we have a hive table with partition column "dt" and "hour", with pattern like this: dt=20180406, hour=13. In griffin, we want to schedule a job to calculate accuracy measure on the hive table hourly, but each time we only want to calculate the data of last hour. e.g.: At 14:05, the data from 13:00 - 13:59 is ready, we want to calculate the data saved in partition hour=13. For this usage, you can refer to https://cwiki.apache.org/confluence/display/GRIFFIN/5.+Griffin+Job+Scheduler+Design<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGRIFFIN%2F5.%2BGriffin%2BJob%2BScheduler%2BDesign&data=01%7C01%7Ckaran.gupta%40tavant.com%7C1231d30bd3604142a40908d5aa95f29c%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=Jt%2F08pVwcSedu2KG0Rbd7i%2BZ3gHm57Yc20bB8NP%2F66E%3D&reserved=0>, griffin can schedule jobs with the data range by time. To achieve this goal, griffin need to know the pattern of partition column value, so we need to configure it in "where" field like "dt=#YYYYMMdd# AND hour=#HH#", griffin service will replace the time pattern between hash tag with timestamp, and generate the concrete string like "dt=20180406 AND hour=13" to trigger the spark job. If you submit griffin job directly to spark cluster, not using griffin service, you need to configure "where" as "dt=20180406 AND hour=13" directly, the measure module needs the concrete where clause. PS: In this document: https://github.com/apache/incubator-griffin/blob/master/griffin-doc/measure/measure-configuration-guide.md#data-connector<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fgriffin-doc%2Fmeasure%2Fmeasure-configuration-guide.md%23data-connector&data=01%7C01%7Ckaran.gupta%40tavant.com%7C1231d30bd3604142a40908d5aa95f29c%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=EkxKdjwBOXSFr%2BfTQ7CwI8GXn7O%2BgwqWoLvPpAJVW8U%3D&reserved=0>, "partition" is out of date, we configure it as "where" now, and the definition should be: where: where conditions string, split by ";", optional. e.g. "dt=20170410 AND hour=15; dt=20170411 AND hour=15; dt=20170412 AND hour=15" It should be like a where clause, to filter data. We'll modify the document later. Thanks, Lionel On Wed, Apr 25, 2018 at 5:30 PM, Karan Gupta <[email protected]<mailto:[email protected]>> wrote: Hi Lionel, Could you help me understand the partition configuration under accuracy measure better with an example. Thank you, Karan Gupta ________________________________ Any comments or statements made in this email are not necessarily those of Tavant Technologies. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. If you have received this in error, please contact the sender and delete the material from any computer. All emails sent from or to Tavant Technologies may be subject to our monitoring procedures.
