Hi Karan,

For most hive table in production environment, there're partition columns,
it's common to use time as the partition column.

For example, we have a hive table with partition column "dt" and "hour",
with pattern like this: dt=20180406, hour=13.

In griffin, we want to schedule a job to calculate accuracy measure on the
hive table hourly, but each time we only want to calculate the data of last
hour.
e.g.: At 14:05, the data from 13:00 - 13:59 is ready, we want to calculate
the data saved in partition hour=13.
For this usage, you can refer to
https://cwiki.apache.org/confluence/display/GRIFFIN/5.+Griffin+Job+Scheduler+Design,
griffin can schedule jobs with the data range by time.

To achieve this goal, griffin need to know the pattern of partition column
value, so we need to configure it in "where" field like "dt=#YYYYMMdd# AND
hour=#HH#", griffin service will replace the time pattern between hash tag
with timestamp, and generate the concrete string like "dt=20180406 AND
hour=13" to trigger the spark job.

If you submit griffin job directly to spark cluster, not using griffin
service, you need to configure "where" as "dt=20180406 AND hour=13"
directly, the measure module needs the concrete where clause.

PS:
In this document:
https://github.com/apache/incubator-griffin/blob/master/griffin-doc/measure/measure-configuration-guide.md#data-connector,
"partition" is out of date, we configure it as "where" now, and the
definition should be:
*where: where conditions string, split by ";", optional. e.g. "dt=20170410
AND hour=15; dt=20170411 AND hour=15; dt=20170412 AND hour=15"*
It should be like a where clause, to filter data. We'll modify the document
later.

Thanks,
Lionel

On Wed, Apr 25, 2018 at 5:30 PM, Karan Gupta <[email protected]> wrote:

> Hi Lionel,
>
>
>
> Could you help me understand the partition configuration under accuracy
> measure better with an example.
>
>
>
>
>
> Thank you,
>
> Karan Gupta
> ------------------------------
> Any comments or statements made in this email are not necessarily those of
> Tavant Technologies. The information transmitted is intended only for the
> person or entity to which it is addressed and may contain confidential
> and/or privileged material. If you have received this in error, please
> contact the sender and delete the material from any computer. All emails
> sent from or to Tavant Technologies may be subject to our monitoring
> procedures.
>

Reply via email to