Hi Jürgen, Thank you for your patch, I think the proposed changes are valuable additions to Flume. Could you please file a Jira at https://issues.apache.org/jira/ and either attach the patches to it and upload them to Reviewboard ( https://reviews.apache.org) or, if it's easier for you, issue a pull request on github (start by forking https://github.com/apache/flume/)?
You can find more details on how to contribute on this wiki page: https://cwiki.apache.org/confluence/display/FLUME/How+to+Contribute (Note: we are experimenting with github pull requests nowadays, it's not mentioned in this doc, but feel free to use it if you prefer) I skimmed through your patches quickly and have the following comments: - Could you please add tests? As a start you can have a look on TestSpoolDirectorySource for example. - Adding the newly added config parameters to the documentation would be very helpful, too. Let us know if you have any questions or need assistance on submitting the patch. Kind regards, Denes On Tue, Oct 11, 2016 at 5:02 PM Jürgen Jakobitsch < [email protected]> wrote: > hi, > > for a project using the SpoolingDirectorySource with a HDFS sink i wanted > to have the same (relative) directory structure in HDFS as in the spool > directory, which uses subdirectories. > > to achieve this i updated (all in flume-ng-core) > > org/apache/flume/client/avro/ReliableSpoolingFileEventReader.java > org/apache/flume/source/SpoolDirectorySource.java > org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java > > to include the following additional (optional) headers (analog to > basenameHeader): > > parentDirectory (the parent directory of the file) > > example: > spooldirectory: /var/lib/flume/data/ > file: /var/lib/flume/data/some/subdirectory/somefile.log > > parentDirectoryHeader = /var/lib/flume/data/some/subdirectory/ > > relativeParentDirectory (the parent directory of the file relative from > the spooldirectory) > > example: > spooldirectory: /var/lib/flume/data/ > file: /var/lib/flume/data/some/subdirectory/somefile.log > > relativeParentDirectoryHeader = some/subdirectory/ > > > i'm now using the following flume config (excerpt) to get a nice folder > structure in HDFS: > > flume.sources.dirSource.spoolDir = /var/lib/flume/data > flume.sources.dirSource.recursiveDirectorySearch = true > flume.sources.dirSource.basenameHeader = true > flume.sources.dirSource.basenameHeaderKey = basename > flume.sources.dirSource.relativeParentDirectoryHeader = true > flume.sources.dirSource.relativeParentDirectoryHeaderKey = > relativeParentDirectory > ... > flume.sinks.HDFS.type = hdfs > flume.sinks.HDFS.hdfs.path = hdfs:// > bigdata.example.com:54310/application/root/directory/%{relativeParentDirectory} > flume.sinks.HDFS.hdfs.fileType = DataStream > flume.sinks.HDFS.hdfs.filePrefix = %{basename} > > example: > > a file : /var/lib/flume/data/some/subdirectory/somefile.log > would now be stored in > > hdfs:// > bigdata.example.com:54310/application/root/directory/some/subdirectory/somefile.log.1476194723885 > > i attach three patches in case someone finds this useful > (used: http://git-wip-us.apache.org/repos/asf/flume.git => branch : trunk > and instructions from here [1] to create the patch) > > krj > > [1] > http://stackoverflow.com/questions/9396240/how-do-i-simply-create-a-patch-from-my-latest-git-commit > > *Jürgen Jakobitsch* > Innovation Director > Semantic Web Company GmbH > EU: +43-1-4021235-0 > Mobile: +43-676-6212710 <+43%20676%206212710> > http://www.semantic-web.at > http://www.poolparty.biz > > > > PERSONAL INFORMATION > > | web : http://www.turnguard.com > > | foaf : http://www.turnguard.com/turnguard > > | g+ : https://plus.google.com/111233759991616358206/posts > > | skype : jakobitsch-punkt > > | xmlns:tg = "http://www.turnguard.com/turnguard#" > > >
