[ 
https://issues.apache.org/jira/browse/SAMZA-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169852#comment-14169852
 ] 

Jonathan Herriott commented on SAMZA-429:
-----------------------------------------

Basically, yes.  However, I'd still have it as a loose data model in a sense 
that the Serde isn't in charge of making sure the structure is correct.  The 
Serdes used are currently declared in the properties file, so it appears as 
though they are meant to be decoupled from the Task, however, the Tasks 
currently have intimate knowledge about what is being returned (you have to 
type cast envelope.getMessage()) which tightly couples it with the Serde, but 
with loose semantics, hence the type cast.

Personally, I'd take a similar approach to how JSON is usually decoded 
(JSONObjects, etc.).  Having a loose data model like this allows you to send 
similar messages that aren't structured 100% the same to the same queue to be 
processed by the same samza job as the job only cares about what it requires in 
the messages and not anything extra.

Right now, I could enforce something like this by creating a wrapper around 
current Serdes to have them return exactly this type of structure, however, I 
believe based on configuration file structure, this should already be the case.



> Decouple Protocol from Task
> ---------------------------
>
>                 Key: SAMZA-429
>                 URL: https://issues.apache.org/jira/browse/SAMZA-429
>             Project: Samza
>          Issue Type: Improvement
>            Reporter: Jonathan Herriott
>
> Maybe someone can point me in the right direction if this is wrong.  One 
> thing I've disliked about tasks is the fact that the protocols have to be 
> baked directly into the Task, so if you want to process JSON, you have to 
> treat the message contents as a HashMap, but if you want to use Avro, it 
> needs to be treated as a GenericRecord object, etc.  I think it would be 
> super beneficial to fully abstract this from the Task object and just treat 
> each thing as a "Message" object.  I think the advantage of this is that you 
> can test with JSON and run with Avro in production or whatever as debugging 
> with JSON is a lot easier than Avro.
> The thing is, in the Task, I only care about the structure, I don't really 
> care about what protocol it is.  Maybe this statement is a bit naive, but I 
> don't think there would ever be a good situation in which you would pass just 
> a string or integer or whatever instead of some form of hierarchical message. 
>  In my opinion, all Serde should return a common interface for a Record for 
> deserialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to