[
https://issues.apache.org/jira/browse/PIG-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Olga Natkovich resolved PIG-1984.
---------------------------------
Resolution: Fixed
We already include the following in 0.9 documentation:
Known Schema Handling
Note the following:
* You can define a schema that includes both the field name and field type.
* You can define a schema that includes the field name only; in this case,
the field type defaults to bytearray.
* You can choose not to define a schema; in this case, the field is
un-named and the field type defaults to bytearray.
If you assign a name to a field, you can refer to that field using the name or
by positional notation. If you don't assign a name to a field (the field is
un-named) you can only refer to the field using positional notation.
If you assign a type to a field, you can subsequently change the type using the
cast operators. If you don't assign a type to a field, the field defaults to
bytearray; you can change the default type using the cast operators.
Unknown Schema Handling
Note the following:
* When you JOIN/COGROUP/CROSS multiple relations, if any relation has a
null schema (no defined schema), the schema for the resulting relation is null.
* If you FLATTEN a bag with empty inner schema, the schema for the
resulting relation is null.
* If you UNION two relations with incompatible schema, the schema for
resulting relation is null.
* If the schema is null, Pig treats all fields as bytearray (in the
backend, Pig will determine the real type for the fields dynamically)
> Nedd to clarify unknown schema
> ------------------------------
>
> Key: PIG-1984
> URL: https://issues.apache.org/jira/browse/PIG-1984
> Project: Pig
> Issue Type: Improvement
> Components: documentation
> Affects Versions: 0.9.0
> Reporter: Daniel Dai
> Assignee: Corinne Chandel
> Fix For: 0.9.0
>
>
> We need to clarify how unknown schema is used in Pig. For every field, if
> user don't tell us the data type, we use bytearray to denote an unknown type.
> In the case when we don't even know how many fields, Pig will derive unknown
> (null) schema.
> For example:
> a = load '1.txt' as (a0, b0);
> a: {a0: bytearray,b0: bytearray}
> a = load '1.txt';
> a: Schema for a unknown
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira