[ https://issues.apache.org/jira/browse/FLUME-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Johny Rufus updated FLUME-2570: ------------------------------- Assignee: (was: Johny Rufus) > Add option to not pad date fields > --------------------------------- > > Key: FLUME-2570 > URL: https://issues.apache.org/jira/browse/FLUME-2570 > Project: Flume > Issue Type: New Feature > Components: Configuration > Affects Versions: v1.5.1 > Reporter: Peter Leckie > > Although technically dates are padded, it would be valuable if Flume was able > to format the date components such that they were expressed like integers, eg > not padded. > For example using the %y, %d or %m alias to create output directories > referencing today's date like the following: > /output/2014/3/5/ > The reason this would be so helpful is when importing the data into either > Hive or Impala. > First of all, Impala does not have an ability to pad partitions, so currently > the only way to do this is to import the data with hive, then use Impala to > access the data(well you could write custom code, however). > Second, padding partitions in hive or impala causes issues for example > pruning of padded partitions is not possible. > The following is an example of a typical work flow: > Data is imported into HDFS using flume with sink as follows: > agent.sinks.snk_avro_snappy.hdfs.path = > hdfs://hdfs/avro/year=%Y/month=%m/day=%d > IMPALA reads the data as follows: > create external table TestAvro (.....) > partitioned by (Year int, Month int, Day int) stored as avro > location '/avro'; > alter table TestAvro add if not exists > partition(Year=cast(year(to_date(now())) as int), > Month=cast(month(to_date(now())) as int), Day=cast(day(to_date(now())) as > int)); > Flume saves the output as > hdfs://hdfs/avro/year=2014/month=12/day=01 > And Impala reads it as: > hdfs://hdfs/avro/year=2014/month=12/day=1 > So this feature request is to add an ability to Flume to write data into a > directory using today's date with no padding on the day or month field. > Implementation details are not important, for example could add a macro which > simply removes padding, instead of futzing with the date aliases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)