[
https://issues.apache.org/jira/browse/FLUME-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246225#comment-14246225
]
Peter Leckie commented on FLUME-2570:
-------------------------------------
[~gwenshap] Yes, so some form of new alias(escape sequences), flag or pad
stripping macro that would allow the existing definition:
hdfs://hdfs/avro/year=%Y/month=%m/day=%d
To translate into:
hdfs://hdfs/avro/year=2014/month=8/day=1
For simplicity of use, my preference would be 2 new aliases, for example:
hdfs://hdfs/avro/year=%Y/month=%u/day=%v
Where:
%u non padded month (1..12)
%v non padded day of month (1..31)
> Add option to not pad date fields
> ---------------------------------
>
> Key: FLUME-2570
> URL: https://issues.apache.org/jira/browse/FLUME-2570
> Project: Flume
> Issue Type: New Feature
> Components: Configuration
> Affects Versions: v1.5.1
> Reporter: Peter Leckie
> Assignee: Johny Rufus
>
> Although technically dates are padded, it would be valuable if Flume was able
> to format the date components such that they were expressed like integers, eg
> not padded.
> For example using the %y, %d or %m alias to create output directories
> referencing today's date like the following:
> /output/2014/3/5/
> The reason this would be so helpful is when importing the data into either
> Hive or Impala.
> First of all, Impala does not have an ability to pad partitions, so currently
> the only way to do this is to import the data with hive, then use Impala to
> access the data(well you could write custom code, however).
> Second, padding partitions in hive or impala causes issues for example
> pruning of padded partitions is not possible.
> The following is an example of a typical work flow:
> Data is imported into HDFS using flume with sink as follows:
> agent.sinks.snk_avro_snappy.hdfs.path =
> hdfs://hdfs/avro/year=%Y/month=%m/day=%d
> IMPALA reads the data as follows:
> create external table TestAvro (.....)
> partitioned by (Year int, Month int, Day int) stored as avro
> location '/avro';
> alter table TestAvro add if not exists
> partition(Year=cast(year(to_date(now())) as int),
> Month=cast(month(to_date(now())) as int), Day=cast(day(to_date(now())) as
> int));
> Flume saves the output as
> hdfs://hdfs/avro/year=2014/month=12/day=01
> And Impala reads it as:
> hdfs://hdfs/avro/year=2014/month=12/day=1
> So this feature request is to add an ability to Flume to write data into a
> directory using today's date with no padding on the day or month field.
> Implementation details are not important, for example could add a macro which
> simply removes padding, instead of futzing with the date aliases.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)