Hi Ashish, thanks for help!
1) regarding the group-by-week, it was entirely my fault... I miswrote
the function... to get grouping by week one needs to have
FIELD: date_add('2007-12-31', (datediff(date_field, '2007-12-31') / 7) *
7)
GROUP BY: datediff(date_field, '2007-12-31') / 7
This then works and you get both the grouping and the first day of the
month as the field value
2) However this is not so simple for the month:
> For getting numeric sorting you can do an order by CAST(month(pdate)
AS INT). That will probably work for you?
This won't work since pdate by itself is a date string, not a number,
and even if it was then the month dates would be:
2008-4-20 which comes after 2008-10-20 when sorting lexically
Therefore the question is how to get zero padding with hive?
I have this expression:
concat(concat(concat(CAST(year(date_field) as STRING), '-'),
CAST(month(date_field) as STRING)), '-1')
which produces dates like:
2008-4-1
but I want zero-padding, which means
2008-04-1
How does one do that with hive?
Is there a function that would take separate year,month and day values
and return properly formated date?
[btw from my experience support for handling dates in hive is extremely
weak, and code becomes heavily convoluted to do it properly]
--
Andraz Tori, CTO
Zemanta Ltd, New York, London, Ljubljana
www.zemanta.com
mail: [email protected]
tel: +386 41 515 767
twitter: andraz, skype: minmax_test