Re: Partition Configuration

Lionel Liu Wed, 25 Apr 2018 03:57:13 -0700

Hi Karan,

There's a mistake in my last email, the "where" condition should be split
by "," not ";".


Thanks,
Lionel

On Wed, Apr 25, 2018 at 6:24 PM, Karan Gupta <[email protected]> wrote:

> Thanks Lionel,
>
> That was helpful.
>
> Thank you,
> Karan Gupta
>
> From: Lionel Liu <[email protected]>
> Sent: Wednesday, April 25, 2018 3:49 PM
> To: Karan Gupta <[email protected]>
> Cc: [email protected]
> Subject: Re: Partition Configuration
>
> Hi Karan,
>
> For most hive table in production environment, there're partition columns,
> it's common to use time as the partition column.
>
> For example, we have a hive table with partition column "dt" and "hour",
> with pattern like this: dt=20180406, hour=13.
>
> In griffin, we want to schedule a job to calculate accuracy measure on the
> hive table hourly, but each time we only want to calculate the data of last
> hour.
> e.g.: At 14:05, the data from 13:00 - 13:59 is ready, we want to calculate
> the data saved in partition hour=13.
> For this usage, you can refer to https://cwiki.apache.org/
> confluence/display/GRIFFIN/5.+Griffin+Job+Scheduler+Design<h
> ttps://apac01.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%
> 2FGRIFFIN%2F5.%2BGriffin%2BJob%2BScheduler%2BDesign&
> data=01%7C01%7Ckaran.gupta%40tavant.com%7C1231d30bd3604142a40908d5aa95
> f29c%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=Jt%
> 2F08pVwcSedu2KG0Rbd7i%2BZ3gHm57Yc20bB8NP%2F66E%3D&reserved=0>, griffin
> can schedule jobs with the data range by time.
>
> To achieve this goal, griffin need to know the pattern of partition column
> value, so we need to configure it in "where" field like "dt=#YYYYMMdd# AND
> hour=#HH#", griffin service will replace the time pattern between hash tag
> with timestamp, and generate the concrete string like "dt=20180406 AND
> hour=13" to trigger the spark job.
>
> If you submit griffin job directly to spark cluster, not using griffin
> service, you need to configure "where" as "dt=20180406 AND hour=13"
> directly, the measure module needs the concrete where clause.
>
> PS:
> In this document: https://github.com/apache/incubator-griffin/blob/master/
> griffin-doc/measure/measure-configuration-guide.md#data-connector<
> https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.
> com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fgriffin-doc%2Fmeasure%
> 2Fmeasure-configuration-guide.md%23data-connector&data=01%
> 7C01%7Ckaran.gupta%40tavant.com%7C1231d30bd3604142a40908d5aa95f29c%
> 7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=EkxKdjwBOXSFr%
> 2BfTQ7CwI8GXn7O%2BgwqWoLvPpAJVW8U%3D&reserved=0>, "partition" is out of
> date, we configure it as "where" now, and the definition should be:
> where: where conditions string, split by ";", optional. e.g. "dt=20170410
> AND hour=15; dt=20170411 AND hour=15; dt=20170412 AND hour=15"
> It should be like a where clause, to filter data. We'll modify the
> document later.
>
> Thanks,
> Lionel
>
> On Wed, Apr 25, 2018 at 5:30 PM, Karan Gupta <[email protected]<
> mailto:[email protected]>> wrote:
> Hi Lionel,
>
> Could you help me understand the partition configuration under accuracy
> measure better with an example.
>
>
> Thank you,
> Karan Gupta
> ________________________________
> Any comments or statements made in this email are not necessarily those of
> Tavant Technologies. The information transmitted is intended only for the
> person or entity to which it is addressed and may contain confidential
> and/or privileged material. If you have received this in error, please
> contact the sender and delete the material from any computer. All emails
> sent from or to Tavant Technologies may be subject to our monitoring
> procedures.
>
>

Re: Partition Configuration

Reply via email to