[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182938#comment-15182938
 ] 

Yogi Devendra commented on APEXMALHAR-2009:
-------------------------------------------

[Yogi]

Thomas,

I agree that toString() may not give valid output for most of the objects.
But, my understanding was to keep csv/json/avro conversion separate from this 
operator. 

Same conversions will be required for few other stores. How about having 
separate POJO to csv/json/avro converter before this operator which would emit 
byte[] or String?

I mentioned about toString() in the proposal assuming operator would receive 
byte[] or String for having useful output. But, in case if it gets some other 
type it can resort to toString() for converting it to some byte[] instead of 
throwing error.

Question:
1. Does it make sense to keep csv/json/avro conversion separate from writing to 
HDFS?

2. To restrict the allowed types: 
Should we just have to input ports for byte[], String respectively? By doing 
this, we can formally disqualify any other type.

> concrete operator for writing to HDFS file
> ------------------------------------------
>
>                 Key: APEXMALHAR-2009
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2009
>             Project: Apache Apex Malhar
>          Issue Type: Task
>            Reporter: Yogi Devendra
>            Assignee: Yogi Devendra
>
> Currently, for writing to HDFS file we have AbstractFileOutputOperator in the 
> malhar library.
> It has following abstract methods :
> 1. protected abstract String getFileName(INPUT tuple)
> 2. protected abstract byte[] getBytesForTuple(INPUT tuple)
> These methods are kept generic to give flexibility to the app developers. 
> But, someone who is new to apex; would look for ready-made implementation 
> instead of extending Abstract implementation.
> Thus, I am proposing to add concrete operator HDFSOutputOperator to malhar. 
> Aim of this operator would be to serve the purpose of ready to use operator 
> for most frequent use-cases.
> Here are my key observations on most frequent use-cases:
> ------------------------------------------------------------------------------
> 1. Writing tuples of type byte[] or String. 
> 2. All tuples on a particular stream land up in the same output file.
> 3. App developer may want to add some custom tuple separator (e.g. newline 
> character) between tuples.
> Discussion thread on mailing list here:
> http://mail-archives.apache.org/mod_mbox/apex-dev/201603.mbox/%3CCAHekGF_6KovS4cjYXzCLdU9En0iPaKO%2BBv%3DEJXbrCuhe9%2BtdrA%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to