[ 
https://issues.apache.org/jira/browse/STORM-828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660533#comment-14660533
 ] 

ASF GitHub Bot commented on STORM-828:
--------------------------------------

GitHub user redsanket opened a pull request:

    https://github.com/apache/storm/pull/668

    STORM-828 HdfsBolt takes a lot of configuration, need good defaults

    Removing configs from HDFSFileTopology example. I have made neccessary 
config and constructor changes as per my understanding. It will be nice to know 
if there is anything I need to look at.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/redsanket/storm STORM-828

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/668.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #668
    
----
commit f09b7e55461194c2c03aedf569a739795cfc87ee
Author: Sanket <schintap@untilservice-lm>
Date:   2015-08-04T06:19:21Z

    STORM-828 removing unnecessary config information from HdfsBolt

----


> HdfsBolt takes a lot of configuration, need good defaults
> ---------------------------------------------------------
>
>                 Key: STORM-828
>                 URL: https://issues.apache.org/jira/browse/STORM-828
>             Project: Apache Storm
>          Issue Type: Improvement
>            Reporter: Robert Joseph Evans
>            Assignee: Sanket Reddy
>
> The following is code from 
> https://github.com/apache/storm/blob/master/external/storm-hdfs/src/test/java/org/apache/storm/hdfs/bolt/HdfsFileTopology.java
>  representing the amount of configuration required to use the HdfsBolt. 
> {code}
>         // sync the filesystem after every 1k tuples
>         SyncPolicy syncPolicy = new CountSyncPolicy(1000);
>         // rotate files every 1 min
>         FileRotationPolicy rotationPolicy = new TimedRotationPolicy(1.0f, 
> TimedRotationPolicy.TimeUnit.MINUTES);
>         FileNameFormat fileNameFormat = new DefaultFileNameFormat()
>                 .withPath("/tmp/foo/")
>                 .withExtension(".txt");
>         RecordFormat format = new DelimitedRecordFormat()
>                 .withFieldDelimiter("|");
>         Yaml yaml = new Yaml();
>         InputStream in = new FileInputStream(args[1]);
>         Map<String, Object> yamlConf = (Map<String, Object>) yaml.load(in);
>         in.close();
>         config.put("hdfs.config", yamlConf);
>         HdfsBolt bolt = new HdfsBolt()
>                 .withConfigKey("hdfs.config")
>                 .withFsUrl(args[0])
>                 .withFileNameFormat(fileNameFormat)
>                 .withRecordFormat(format)
>                 .withRotationPolicy(rotationPolicy)
>                 .withSyncPolicy(syncPolicy)
>                 .addRotationAction(new 
> MoveFileAction().toDestination("/tmp/dest2/"));
> {code}
> This is way too much.  If it were just an example showing all of the 
> possibilities that would be OK but of the 8 lines used in the construction of 
> the bolt, 5 of them are required or the bolt will blow up at run time.  We 
> should provide reasonable defaults for everything that can have a reasonable 
> default.  And required parameters should be passed in through the 
> constructor, not as builder arguments.  I realize we need to maintain 
> backwards compatibility so we may need some new Bolt definitions.
> {code}
> HdfsTSVBolt bolt = new HdfsTSVBolt(outputDir);
> {code}
> If someone wanted to sync every 100 records instead of every 1000 we could do
> {code}
> TSVFileBolt bolt = new TSVFileBolt(outputDir).withSyncPolicy(new 
> CountSyncPolicy(100))
> {code}
> I would like to see a base HdfsFileBolt that requires a record format, and an 
> output directory.  It would have defaults for everything else.  Then we could 
> have a TSVFileBolt and CSVFileBolt subclass it and ideally SequenceFileBolt 
> as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to