[jira] [Commented] (FLUME-2570) Add option to not pad date fields

Peter Leckie (JIRA) Sun, 14 Dec 2014 17:42:25 -0800

    [ 
https://issues.apache.org/jira/browse/FLUME-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246225#comment-14246225
 ]


Peter Leckie commented on FLUME-2570:
-------------------------------------

[~gwenshap] Yes, so some form of new alias(escape sequences), flag or pad 
stripping macro that would allow the existing definition:
hdfs://hdfs/avro/year=%Y/month=%m/day=%d

To translate into:
hdfs://hdfs/avro/year=2014/month=8/day=1

For simplicity of use, my preference would be 2 new aliases, for example:
hdfs://hdfs/avro/year=%Y/month=%u/day=%v

Where:
%u      non padded month (1..12)
%v      non padded day of month (1..31)

> Add option to not pad date fields
> ---------------------------------
>
>                 Key: FLUME-2570
>                 URL: https://issues.apache.org/jira/browse/FLUME-2570
>             Project: Flume
>          Issue Type: New Feature
>          Components: Configuration
>    Affects Versions: v1.5.1
>            Reporter: Peter Leckie
>            Assignee: Johny Rufus
>
> Although technically dates are padded, it would be valuable if Flume was able 
> to format the date components such that they were expressed like integers, eg 
> not padded.
> For example using the %y, %d or %m alias to create output directories 
> referencing today's date like the following:
> /output/2014/3/5/
> The reason this would be so helpful is when importing the data into either 
> Hive or Impala.
> First of all, Impala does not have an ability to pad partitions, so currently 
> the only way to do this is to import the data with hive, then use Impala to 
> access the data(well you could write custom code, however).
> Second, padding partitions in hive or impala causes issues for example 
> pruning of padded partitions is not possible.
> The following is an example of a typical work flow:
> Data is imported into HDFS using flume with sink as follows:
> agent.sinks.snk_avro_snappy.hdfs.path = 
> hdfs://hdfs/avro/year=%Y/month=%m/day=%d
> IMPALA reads the data as follows:
> create external table TestAvro (.....)
> partitioned by (Year int, Month int, Day int) stored as avro
> location '/avro';
> alter table TestAvro add if not exists 
> partition(Year=cast(year(to_date(now())) as int), 
> Month=cast(month(to_date(now())) as int), Day=cast(day(to_date(now())) as 
> int));
> Flume saves the output as
> hdfs://hdfs/avro/year=2014/month=12/day=01
> And Impala reads it as:
> hdfs://hdfs/avro/year=2014/month=12/day=1
> So this feature request is to add an ability to Flume to write data into a 
> directory using today's date with no padding on the day or month field.
> Implementation details are not important, for example could add a macro which 
> simply removes padding, instead of futzing with the date aliases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLUME-2570) Add option to not pad date fields

Reply via email to