Hi Jürgen,

Thank you for your patch, I think the proposed changes are valuable
additions to Flume.
Could you please file a Jira at https://issues.apache.org/jira/ and either
attach the patches to it and upload them to Reviewboard (
https://reviews.apache.org) or, if it's easier for you, issue a pull
request on github (start by forking https://github.com/apache/flume/)?

You can find more details on how to contribute on this wiki page:
https://cwiki.apache.org/confluence/display/FLUME/How+to+Contribute
(Note: we are experimenting with github pull requests nowadays, it's not
mentioned in this doc, but feel free to use it if you prefer)

I skimmed through your patches quickly and have the following comments:
- Could you please add tests? As a start you can have a look
on TestSpoolDirectorySource for example.
- Adding the newly added config parameters to the documentation would be
very helpful, too.

Let us know if you have any questions or need assistance on submitting the
patch.

Kind regards,
Denes

On Tue, Oct 11, 2016 at 5:02 PM Jürgen Jakobitsch <
[email protected]> wrote:

> hi,
>
> for a project using the SpoolingDirectorySource with a HDFS sink i wanted
> to have the same (relative) directory structure in HDFS as in the spool
> directory, which uses subdirectories.
>
> to achieve this i updated (all in flume-ng-core)
>
> org/apache/flume/client/avro/ReliableSpoolingFileEventReader.java
> org/apache/flume/source/SpoolDirectorySource.java
> org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java
>
> to include the following additional (optional) headers (analog to
> basenameHeader):
>
> parentDirectory (the parent directory of the file)
>
> example:
> spooldirectory: /var/lib/flume/data/
> file: /var/lib/flume/data/some/subdirectory/somefile.log
>
> parentDirectoryHeader = /var/lib/flume/data/some/subdirectory/
>
> relativeParentDirectory (the parent directory of the file relative from
> the spooldirectory)
>
> example:
> spooldirectory: /var/lib/flume/data/
> file: /var/lib/flume/data/some/subdirectory/somefile.log
>
> relativeParentDirectoryHeader = some/subdirectory/
>
>
> i'm now using the following flume config (excerpt) to get a nice folder
> structure in HDFS:
>
> flume.sources.dirSource.spoolDir = /var/lib/flume/data
> flume.sources.dirSource.recursiveDirectorySearch = true
> flume.sources.dirSource.basenameHeader = true
> flume.sources.dirSource.basenameHeaderKey = basename
> flume.sources.dirSource.relativeParentDirectoryHeader = true
> flume.sources.dirSource.relativeParentDirectoryHeaderKey =
> relativeParentDirectory
> ...
> flume.sinks.HDFS.type = hdfs
> flume.sinks.HDFS.hdfs.path = hdfs://
> bigdata.example.com:54310/application/root/directory/%{relativeParentDirectory}
> flume.sinks.HDFS.hdfs.fileType = DataStream
> flume.sinks.HDFS.hdfs.filePrefix = %{basename}
>
> example:
>
> a file : /var/lib/flume/data/some/subdirectory/somefile.log
> would now be stored in
>
> hdfs://
> bigdata.example.com:54310/application/root/directory/some/subdirectory/somefile.log.1476194723885
>
> i attach three patches in case someone finds this useful
> (used: http://git-wip-us.apache.org/repos/asf/flume.git => branch : trunk
> and instructions from here [1] to create the patch)
>
> krj
>
> [1]
> http://stackoverflow.com/questions/9396240/how-do-i-simply-create-a-patch-from-my-latest-git-commit
>
> *Jürgen Jakobitsch*
> Innovation Director
> Semantic Web Company GmbH
> EU: +43-1-4021235-0
> Mobile: +43-676-6212710 <+43%20676%206212710>
> http://www.semantic-web.at
> http://www.poolparty.biz
>
>
>
> PERSONAL INFORMATION
>
> | web       : http://www.turnguard.com
>
> | foaf      : http://www.turnguard.com/turnguard
>
> | g+        : https://plus.google.com/111233759991616358206/posts
>
> | skype     : jakobitsch-punkt
>
> | xmlns:tg  = "http://www.turnguard.com/turnguard#";
>
>
>

Reply via email to