[ 
https://issues.apache.org/jira/browse/TEZ-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074712#comment-14074712
 ] 

Bikas Saha commented on TEZ-1194:
---------------------------------

Looks good overall.

Also, the serde of the *EntityDescriptor objects is not handling the version. 
So it will get lost across RPC. 
TestDagTypeConverters#testTezEntityDescriptorSerialization() could be enhanced 
to make sure version is being handled correctly for the payload. Maybe also an 
enhancement of an existing test in TestHistoryEventsProtoConversion.java.

These if stmts should actually become if(userPayload.getPayload() != null) or 
better if(userPayload.hasPayload()). Right? Or are we saying that 
TezUserPayload will be null if no byte[] payload has been specified. I would 
prefer that we always get a TezUserPayload object. The actual payload inside it 
may be null.
There are other if stmts like this in the patch.
{code}       if (userPayload != null) {
-        boolean doLocalityCheck = userPayload[0] > 0 ? true : false;
+        boolean doLocalityCheck = userPayload.getPayload()[0] > 0 ? true : 
false;{code}

IMO, in most cases, we should not be using DAGTypeConverters when non-serde or 
non-proto code is involved.
This should just be "new TezUserPayload(bb)" right? DAGTypeConverters should 
not be used in the example code. Its supposed to be @Private. Same for 
IntersectDataGen.java and other examples.
{code}+    byte[] bb = {(byte) (doLocalityCheck ? 1 : 0), 1};
+    TezUserPayload procPayload = 
DagTypeConverters.convertToTezUserPayload(bb);{code}
Similarly, this and others like this in the patch can become "new 
TezUserPayload(byte[]) " instead of using DAGTypeConverters.
{code}
-    byte[] userPayload = payloadProto.build().toByteArray();
+    TezUserPayload userPayload = 
DagTypeConverters.convertToTezUserPayload(payloadProto.build().toByteArray());{code}
...
// from TestDAGPlan
ProcessorDescriptor pd1 = new ProcessorDescriptor("processor1").
-        setUserPayload("processor1Bytes".getBytes());
+        
setUserPayload(DagTypeConverters.convertToTezUserPayload("processor1Bytes".getBytes()));
...
// from Configurer code
@InterfaceAudience.Private
   public void fromByteArray(byte[] payload) {
+    fromUserPayload(DagTypeConverters.convertToTezUserPayload(payload));
+  }
...
-  public byte[] getInputPayload() {
-    return inputConf.toByteArray();
+  public TezUserPayload getInputPayload() {
+    return DagTypeConverters.convertToTezUserPayload(inputConf.toByteArray());
   }
{code}
In most cases, we should be seeing the DAGTypeConverter code only in near the 
RPC serde code.

We can probably drop the Tez in name and just call it UserPayload.

We should make sure we dont do additional byte[] copies when we do this :)


> Make TezUserPayload user facing for payload specification and change to 
> ByteBuffer
> ----------------------------------------------------------------------------------
>
>                 Key: TEZ-1194
>                 URL: https://issues.apache.org/jira/browse/TEZ-1194
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Tsuyoshi OZAWA
>         Attachments: TEZ-1194.1.patch, TEZ-1194.2.patch
>
>
> Now that we have TezUserPayload being used internally to represent user 
> payload it may be useful to make it user facing on the API for specifying 
> payloads. Advantages
> 1) Clear code for the user instead of having untyped byte[] everywhere
> 2) Lets us internally evolve the representation of user payload and make it 
> more efficient without having to break APIs. We can start with 
> TezUserPayload(byte[]) and then move on to TezUserPayload(ByteBuffer) and so 
> on while maintaining backwards compatibility without needing to add new 
> methods. Old code can be translated within TezUserPayload while user migrates 
> the code.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to