[
https://issues.apache.org/jira/browse/STORM-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196265#comment-15196265
]
ASF GitHub Bot commented on STORM-1464:
---------------------------------------
Github user dossett commented on a diff in the pull request:
https://github.com/apache/storm/pull/1044#discussion_r56241176
--- Diff: external/storm-hdfs/README.md ---
@@ -240,6 +240,23 @@ If you are using Trident and sequence files you can do
something like this:
.addRotationAction(new
MoveFileAction().withDestination("/dest2/"));
```
+### Data Partitioning
+Data can be partitioned to different HDFS directories based on
characteristics of the tuple being processed or purely
+external factors, such as system time. To partition your your data, write
a class that implements the ```Partitioner```
+interface and pass it to the withPartitioner() method of your bolt. The
getPartitionPath() method returns a partition
+path for a given tuple.
+
+Here's an example of a Partitioner that operates on a specific field of
data:
+
+```java
+
+ Partitioner partitoner = new Partitioner() {
+ @Override
+ public String getPartitionPath(Tuple tuple) {
+ return Path.SEPARATOR + "city=" +
tuple.getStringByField("city");
--- End diff --
I thought about having Partitioner returning an actual path but decided
against it for two reasons:
- I liked the idea of the "partition" being solely a function of the tuple
without reference to anything else
- Since end users implement a Partitioner having it return a complete path
would give the user access to details otherwise hidden from their code.
> storm-hdfs should support writing to multiple files
> ---------------------------------------------------
>
> Key: STORM-1464
> URL: https://issues.apache.org/jira/browse/STORM-1464
> Project: Apache Storm
> Issue Type: Improvement
> Components: storm-hdfs
> Reporter: Aaron Dossett
> Assignee: Aaron Dossett
> Labels: avro
>
> Examples of when this is needed include:
> - One avro bolt writing multiple schemas, each of which require a different
> file. Schema evolution is a common use of avro and the avro bolt should
> support that seamlessly.
> - Partitioning output to different directories based on the tuple contents.
> For example, if the tuple contains a "USER" field, it should be possible to
> partition based on that value.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)