[ 
https://issues.apache.org/jira/browse/FLUME-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johny Rufus updated FLUME-2570:
-------------------------------
    Assignee:     (was: Johny Rufus)

> Add option to not pad date fields
> ---------------------------------
>
>                 Key: FLUME-2570
>                 URL: https://issues.apache.org/jira/browse/FLUME-2570
>             Project: Flume
>          Issue Type: New Feature
>          Components: Configuration
>    Affects Versions: v1.5.1
>            Reporter: Peter Leckie
>
> Although technically dates are padded, it would be valuable if Flume was able 
> to format the date components such that they were expressed like integers, eg 
> not padded.
> For example using the %y, %d or %m alias to create output directories 
> referencing today's date like the following:
> /output/2014/3/5/
> The reason this would be so helpful is when importing the data into either 
> Hive or Impala.
> First of all, Impala does not have an ability to pad partitions, so currently 
> the only way to do this is to import the data with hive, then use Impala to 
> access the data(well you could write custom code, however).
> Second, padding partitions in hive or impala causes issues for example 
> pruning of padded partitions is not possible.
> The following is an example of a typical work flow:
> Data is imported into HDFS using flume with sink as follows:
> agent.sinks.snk_avro_snappy.hdfs.path = 
> hdfs://hdfs/avro/year=%Y/month=%m/day=%d
> IMPALA reads the data as follows:
> create external table TestAvro (.....)
> partitioned by (Year int, Month int, Day int) stored as avro
> location '/avro';
> alter table TestAvro add if not exists 
> partition(Year=cast(year(to_date(now())) as int), 
> Month=cast(month(to_date(now())) as int), Day=cast(day(to_date(now())) as 
> int));
> Flume saves the output as
> hdfs://hdfs/avro/year=2014/month=12/day=01
> And Impala reads it as:
> hdfs://hdfs/avro/year=2014/month=12/day=1
> So this feature request is to add an ability to Flume to write data into a 
> directory using today's date with no padding on the day or month field.
> Implementation details are not important, for example could add a macro which 
> simply removes padding, instead of futzing with the date aliases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to