[
https://issues.apache.org/jira/browse/STORM-828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660672#comment-14660672
]
ASF GitHub Bot commented on STORM-828:
--------------------------------------
Github user redsanket commented on a diff in the pull request:
https://github.com/apache/storm/pull/668#discussion_r36456579
--- Diff:
external/storm-hdfs/src/main/java/org/apache/storm/hdfs/bolt/CSVFileBolt.java
---
@@ -0,0 +1,32 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.storm.hdfs.bolt;
+
+import org.apache.storm.hdfs.bolt.format.DefaultFileNameFormat;
+import org.apache.storm.hdfs.bolt.format.RecordFormat;
+import org.apache.storm.hdfs.bolt.rotation.TimedRotationPolicy;
+import org.apache.storm.hdfs.bolt.sync.CountSyncPolicy;
+import org.apache.storm.hdfs.common.rotation.MoveFileAction;
+
+public class CSVFileBolt extends HdfsBolt {
+ private static String fileExtension = ".csv";
+
+ public CSVFileBolt(String sourceDir, String destDir) {
+ super(sourceDir, destDir, fileExtension);
+ }
+}
--- End diff --
I guess I could set the RecordDefaultDelimiter to ",". Yes, thanks for
pointing it out, I will make the changes and it up soon
> HdfsBolt takes a lot of configuration, need good defaults
> ---------------------------------------------------------
>
> Key: STORM-828
> URL: https://issues.apache.org/jira/browse/STORM-828
> Project: Apache Storm
> Issue Type: Improvement
> Reporter: Robert Joseph Evans
> Assignee: Sanket Reddy
>
> The following is code from
> https://github.com/apache/storm/blob/master/external/storm-hdfs/src/test/java/org/apache/storm/hdfs/bolt/HdfsFileTopology.java
> representing the amount of configuration required to use the HdfsBolt.
> {code}
> // sync the filesystem after every 1k tuples
> SyncPolicy syncPolicy = new CountSyncPolicy(1000);
> // rotate files every 1 min
> FileRotationPolicy rotationPolicy = new TimedRotationPolicy(1.0f,
> TimedRotationPolicy.TimeUnit.MINUTES);
> FileNameFormat fileNameFormat = new DefaultFileNameFormat()
> .withPath("/tmp/foo/")
> .withExtension(".txt");
> RecordFormat format = new DelimitedRecordFormat()
> .withFieldDelimiter("|");
> Yaml yaml = new Yaml();
> InputStream in = new FileInputStream(args[1]);
> Map<String, Object> yamlConf = (Map<String, Object>) yaml.load(in);
> in.close();
> config.put("hdfs.config", yamlConf);
> HdfsBolt bolt = new HdfsBolt()
> .withConfigKey("hdfs.config")
> .withFsUrl(args[0])
> .withFileNameFormat(fileNameFormat)
> .withRecordFormat(format)
> .withRotationPolicy(rotationPolicy)
> .withSyncPolicy(syncPolicy)
> .addRotationAction(new
> MoveFileAction().toDestination("/tmp/dest2/"));
> {code}
> This is way too much. If it were just an example showing all of the
> possibilities that would be OK but of the 8 lines used in the construction of
> the bolt, 5 of them are required or the bolt will blow up at run time. We
> should provide reasonable defaults for everything that can have a reasonable
> default. And required parameters should be passed in through the
> constructor, not as builder arguments. I realize we need to maintain
> backwards compatibility so we may need some new Bolt definitions.
> {code}
> HdfsTSVBolt bolt = new HdfsTSVBolt(outputDir);
> {code}
> If someone wanted to sync every 100 records instead of every 1000 we could do
> {code}
> TSVFileBolt bolt = new TSVFileBolt(outputDir).withSyncPolicy(new
> CountSyncPolicy(100))
> {code}
> I would like to see a base HdfsFileBolt that requires a record format, and an
> output directory. It would have defaults for everything else. Then we could
> have a TSVFileBolt and CSVFileBolt subclass it and ideally SequenceFileBolt
> as well.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)