[ 
https://issues.apache.org/jira/browse/PIG-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-1984.
---------------------------------

    Resolution: Fixed

We already include the following in 0.9 documentation:

Known Schema Handling

Note the following:

    * You can define a schema that includes both the field name and field type.
    * You can define a schema that includes the field name only; in this case, 
the field type defaults to bytearray.
    * You can choose not to define a schema; in this case, the field is 
un-named and the field type defaults to bytearray.

If you assign a name to a field, you can refer to that field using the name or 
by positional notation. If you don't assign a name to a field (the field is 
un-named) you can only refer to the field using positional notation.

If you assign a type to a field, you can subsequently change the type using the 
cast operators. If you don't assign a type to a field, the field defaults to 
bytearray; you can change the default type using the cast operators.

Unknown Schema Handling

Note the following:

    * When you JOIN/COGROUP/CROSS multiple relations, if any relation has a 
null schema (no defined schema), the schema for the resulting relation is null.
    * If you FLATTEN a bag with empty inner schema, the schema for the 
resulting relation is null.
    * If you UNION two relations with incompatible schema, the schema for 
resulting relation is null.
    * If the schema is null, Pig treats all fields as bytearray (in the 
backend, Pig will determine the real type for the fields dynamically)


> Nedd to clarify unknown schema
> ------------------------------
>
>                 Key: PIG-1984
>                 URL: https://issues.apache.org/jira/browse/PIG-1984
>             Project: Pig
>          Issue Type: Improvement
>          Components: documentation
>    Affects Versions: 0.9.0
>            Reporter: Daniel Dai
>            Assignee: Corinne Chandel
>             Fix For: 0.9.0
>
>
> We need to clarify how unknown schema is used in Pig. For every field, if 
> user don't tell us the data type, we use bytearray to denote an unknown type. 
> In the case when we don't even know how many fields, Pig will derive unknown 
> (null) schema.
> For example:
> a = load '1.txt' as (a0, b0);
> a: {a0: bytearray,b0: bytearray}
> a = load '1.txt';
> a: Schema for a unknown

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to