[
https://issues.apache.org/jira/browse/PIG-5231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025155#comment-16025155
]
Daniel Dai commented on PIG-5231:
---------------------------------
Vote for 3. We pick the first schema in dirs in all LoadFunc, such as
OrcStorage, AvroStorage. I don't think we shall make an exception for
PigStorage. +1 for the patch.
> PigStorage with -schema may produce inconsistent outputs with more fields
> -------------------------------------------------------------------------
>
> Key: PIG-5231
> URL: https://issues.apache.org/jira/browse/PIG-5231
> Project: Pig
> Issue Type: Bug
> Reporter: Koji Noguchi
> Assignee: Koji Noguchi
> Priority: Minor
> Attachments: pig-5231-v01.patch
>
>
> When multiple directories are passed to PigStorage(',','-schema'), pig will
> {quote}
> No attempt to merge conflicting schemas is made during loading. The first
> schema encountered during a file system scan is used.
> {quote}
> For two directories input with schema
> file1: (f1:chararray, f2:int) and
> file2: (f1:chararray, f2:int, f3:int)
> Pig will pick the first schema from file1 and only allow f1, f2 access.
> However, output would still contain 3 fields for tuples from file2. This
> later leads to complete corrupt outputs due to shifted fields resulting in
> incorrect references.
> (This may also happen when input itself contains the delimiter.)
> If file2 schema is picked, this is already handled by filling the missing
> fields with null. (PIG-3100)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)