[ 
https://issues.apache.org/jira/browse/STORM-828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14725399#comment-14725399
 ] 

Sanket Reddy commented on STORM-828:
------------------------------------

I could make it more specific by making the delimited record format to "," for 
CSV and set a default record format for TSV. But you are right, it might depend 
on whether we want to make it more specific or leave the user to set the 
config. It would be helpful if you can give your opinion on how I should proceed

> HdfsBolt takes a lot of configuration, need good defaults
> ---------------------------------------------------------
>
>                 Key: STORM-828
>                 URL: https://issues.apache.org/jira/browse/STORM-828
>             Project: Apache Storm
>          Issue Type: Improvement
>            Reporter: Robert Joseph Evans
>            Assignee: Sanket Reddy
>
> The following is code from 
> https://github.com/apache/storm/blob/master/external/storm-hdfs/src/test/java/org/apache/storm/hdfs/bolt/HdfsFileTopology.java
>  representing the amount of configuration required to use the HdfsBolt. 
> {code}
>         // sync the filesystem after every 1k tuples
>         SyncPolicy syncPolicy = new CountSyncPolicy(1000);
>         // rotate files every 1 min
>         FileRotationPolicy rotationPolicy = new TimedRotationPolicy(1.0f, 
> TimedRotationPolicy.TimeUnit.MINUTES);
>         FileNameFormat fileNameFormat = new DefaultFileNameFormat()
>                 .withPath("/tmp/foo/")
>                 .withExtension(".txt");
>         RecordFormat format = new DelimitedRecordFormat()
>                 .withFieldDelimiter("|");
>         Yaml yaml = new Yaml();
>         InputStream in = new FileInputStream(args[1]);
>         Map<String, Object> yamlConf = (Map<String, Object>) yaml.load(in);
>         in.close();
>         config.put("hdfs.config", yamlConf);
>         HdfsBolt bolt = new HdfsBolt()
>                 .withConfigKey("hdfs.config")
>                 .withFsUrl(args[0])
>                 .withFileNameFormat(fileNameFormat)
>                 .withRecordFormat(format)
>                 .withRotationPolicy(rotationPolicy)
>                 .withSyncPolicy(syncPolicy)
>                 .addRotationAction(new 
> MoveFileAction().toDestination("/tmp/dest2/"));
> {code}
> This is way too much.  If it were just an example showing all of the 
> possibilities that would be OK but of the 8 lines used in the construction of 
> the bolt, 5 of them are required or the bolt will blow up at run time.  We 
> should provide reasonable defaults for everything that can have a reasonable 
> default.  And required parameters should be passed in through the 
> constructor, not as builder arguments.  I realize we need to maintain 
> backwards compatibility so we may need some new Bolt definitions.
> {code}
> HdfsTSVBolt bolt = new HdfsTSVBolt(outputDir);
> {code}
> If someone wanted to sync every 100 records instead of every 1000 we could do
> {code}
> TSVFileBolt bolt = new TSVFileBolt(outputDir).withSyncPolicy(new 
> CountSyncPolicy(100))
> {code}
> I would like to see a base HdfsFileBolt that requires a record format, and an 
> output directory.  It would have defaults for everything else.  Then we could 
> have a TSVFileBolt and CSVFileBolt subclass it and ideally SequenceFileBolt 
> as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to