[jira] [Commented] (STORM-1464) storm-hdfs should support writing to multiple files

ASF GitHub Bot (JIRA) Tue, 15 Mar 2016 14:09:55 -0700

    [ 
https://issues.apache.org/jira/browse/STORM-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196265#comment-15196265
 ]


ASF GitHub Bot commented on STORM-1464:
---------------------------------------

Github user dossett commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1044#discussion_r56241176
  
    --- Diff: external/storm-hdfs/README.md ---
    @@ -240,6 +240,23 @@ If you are using Trident and sequence files you can do 
something like this:
                     .addRotationAction(new 
MoveFileAction().withDestination("/dest2/"));
     ```
     
    +### Data Partitioning
    +Data can be partitioned to different HDFS directories based on 
characteristics of the tuple being processed or purely
    +external factors, such as system time.  To partition your your data, write 
a class that implements the ```Partitioner```
    +interface and pass it to the withPartitioner() method of your bolt. The 
getPartitionPath() method returns a partition 
    +path for a given tuple.
    +
    +Here's an example of a Partitioner that operates on a specific field of 
data:
    +
    +```java
    +
    +    Partitioner partitoner = new Partitioner() {
    +            @Override
    +            public String getPartitionPath(Tuple tuple) {
    +                return Path.SEPARATOR + "city=" + 
tuple.getStringByField("city");
    --- End diff --
    
    I thought about having Partitioner returning an actual path but decided 
against it for two reasons:
    - I liked the idea of the "partition" being solely a function of the tuple 
without reference to anything else
    - Since end users implement a Partitioner having it return a complete path 
would give the user access to details otherwise hidden from their code.


> storm-hdfs should support writing to multiple files
> ---------------------------------------------------
>
>                 Key: STORM-1464
>                 URL: https://issues.apache.org/jira/browse/STORM-1464
>             Project: Apache Storm
>          Issue Type: Improvement
>          Components: storm-hdfs
>            Reporter: Aaron Dossett
>            Assignee: Aaron Dossett
>              Labels: avro
>
> Examples of when this is needed include:
> - One avro bolt writing multiple schemas, each of which require a different 
> file. Schema evolution is a common use of avro and the avro bolt should 
> support that seamlessly.
> - Partitioning output to different directories based on the tuple contents.  
> For example, if the tuple contains a "USER" field, it should be possible to 
> partition based on that value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STORM-1464) storm-hdfs should support writing to multiple files

Reply via email to