Lorand Bendig created PIG-3911:
----------------------------------

             Summary: Define unique fields with @OutputSchema
                 Key: PIG-3911
                 URL: https://issues.apache.org/jira/browse/PIG-3911
             Project: Pig
          Issue Type: Improvement
            Reporter: Lorand Bendig
            Assignee: Lorand Bendig


As a continuation of PIG-2361, I think that {{@OutputSchema}} could be extended 
in order to eliminate the repeating patterns of {{EvalFunc#outputSchema()}} 
found in most UDFs. 
I'd come up with the following syntax:

Complex schema definition:
{code}
@OutputSchema("y:bag{t:tuple(len:int,word:chararray,${0}:int)},${1}:chararray,${2}:bytearray")
@SchemaFields({
  @Unique(name="word"),
  @Unique(name="${0}"),
  @Unique(name="${1}", prefix="id"),
  @Unique(name="${2}", prefix="item", postfix="id")}
)
public class MyUDF {...}
{code}
Rewrite rules:
{code}
word => "word" + "_" + nextSchemaId
${0} => this.getClass.getName().toLower() + "_" + nextSchemaId
${1} => "id" + "_" + nextSchemaId
${2} => "item" + "_" + nextSchemaId + "_" + "id"
{code}
Result:
{code}
y:bag{t:tuple(len:int,word_1:chararray,com.example.MyUDF_5:int)},id_8:chararray,item_9_id:bytearray
{code}

Prefix and postfix attributes would be applied only for placeholders.

Single field definitions:
{code}
@OutputSchema("double")
=> equivalent to:
return new Schema(new Schema.FieldSchema(null, DataType.DOUBLE)); 
{code}
{code}
@OutputSchema("d:double")
=> equivalent to:
return new Schema(new Schema.FieldSchema("d", DataType.DOUBLE)); 
{code}
{code}
@OutputSchema("tuple")
@Unique
=> equivalent to:
return new Schema(new Schema.FieldSchema(getSchemaName(  
  this.getClass().getName().toLowerCase(), input), DataType.TUPLE));
{code}
{code}
@OutputSchema("words:tuple")
@Unique
=> equivalent to:
return new Schema(new Schema.FieldSchema(getSchemaName("words", input), 
  DataType.TUPLE));
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to