[
https://issues.apache.org/jira/browse/PIG-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083939#comment-13083939
]
Vivek Padmanabhan commented on PIG-2217:
----------------------------------------
For the above mentioned script the schema is marked as null from the logical
layer itself, ie LOStore.getSchema() returns a null.
Since all the schema is derived from its predeccessor operators, the schema
object for LOLoad itself is null.
Hence this scenario will be happening for all scripts which does not define a
schema in the load stmt.
In Pig 0.7 , even if the schema value is null from logical layer, while
translating, it is wrapped with an empty schema
For ex; In LogToPhyTranslationVisitor
public void visit(LOStore loStore) throws VisitorException {
....
store.setSchema(new Schema(loStore.getSchema()));
Hence the file will look like below
.pig_header (empty file )
.pig_schema
-----------
{"fields":[],"version":0,"sortKeys":[-1],"sortKeyOrders":["ASCENDING"]}
But in 0.8 (new logical plan) onwards, the null value is directly returned,
because of which the metadata is not saved.
This change in behaviour came with the new logical plan introduced in Pig 0.8
which also got transferred into Pig 0.9.
Disabling the new logical plan in 0.8 ( pig -useversion 0.8
-Dpig.usenewlogicalplan=false), will produce
".pig_header" and ".pig_schema" files.
> POStore.getSchema() returns null if I dont have a schema defined at load
> statement
> ----------------------------------------------------------------------------------
>
> Key: PIG-2217
> URL: https://issues.apache.org/jira/browse/PIG-2217
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.1, 0.9.0
> Reporter: Vivek Padmanabhan
>
> If I don't specify a schema definition in load statement, then
> POStore.getSchema() returns null because of which PigOutputCommitter is not
> storing schema .
> For example if I run the below script, ".pig_header" and ".pig_schema" files
> wont be saved.
> load_1 = LOAD 'i1' USING PigStorage();
> ordered_data_1 = ORDER load_1 BY * ASC PARALLEL 1;
> STORE ordered_data_1 INTO 'myout' using
> org.apache.pig.piggybank.storage.PigStorageSchema();
> This works fine with Pig 0.7, but 0.8 onwards StoreMetadata.storeSchema is
> not getting invoked for these cases.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira