Use parquet pig loader to read parquet file

[email protected] Wed, 03 Jan 2018 09:39:31 -0800

Hi

I have two parquet files they are made by different way,


1st
Use pig load protobuf file then store by parquet pig storer(parquet-pig-bundle 
1.9.0)

2nd
Use protobuf to parquet and add this 
patch(https://github.com/apache/parquet-mr/pull/411) or store by spark

But when I use parquet pig loader to read 2nd example and describe schema, it 
show different schema with I use same loader to read 1st case(read example and 
describe). And when read 2nd example then access tuple I can't use "." to point 
out tuple content like normal pig script, error message is "Cannot find field 
name in element:tuple(name:chararray,name2:chararray)”.

*read example and describe:
1st case:
  value:(guid: chararray, blob: (info: {info_tuple: (name: chararray,name2: 
chararray)})
2nd case:
  value:(guid: chararray, blob: (info: {list: (element: (name: chararray,name2: 
chararray))}))

*access tuple:
1st case:
  B = foreach A generate value.blob.info.name;
2nd case:
  B = foreach A generate value.blob.info.name;
  //error message: Cannot find field name in 
element:tuple(name:chararray,name2:chararray)


And I ask Julien, when use 2nd case I can add “.element.” like “  B = foreach A 
generate value.blob.info.element.name;” to get tuple value, but element is a 
virtual level, like “ info_tuple”, maybe need add some solution to avoid 
virtual level

Thank you and regards,
Abel ke

<table class="TM_EMAIL_NOTICE"><tr><td><pre>
TREND MICRO EMAIL NOTICE
The information contained in this email and any attachments is confidential 
and may be subject to copyright or other intellectual property protection. 
If you are not the intended recipient, you are not authorized to use or 
disclose this information, and we request that you notify us by reply mail or
telephone and delete the original message from your mail system.
</pre></td></tr></table>

Use parquet pig loader to read parquet file

Reply via email to