Some question about parquet schema

[email protected] Mon, 20 Nov 2017 20:51:07 -0800

Hi
We are the developers in Trend Micro, and currently we are doing some surveys 
for Parquet data format and want to get benefit from its smaller data size and 
efficient data loading.
However, now we found the following three different kinds of Parquet schema and 
don’t know which one is the correct one. (Or which one is officially supported.)
The 1st seems for Pig only, because we can use parquet-mr (parquet-pig-bundle) 
to read.
For Spark, we can read both 2nd and 3rd cases successfully.
AWS Athena can only read the 2st case.
My question is, I want to use parquet format in our production site, which 
schema should I use?


First case use pig:
[cid:[email protected]]
Second case use spark:
[cid:[email protected]]

Third case use parquet-mr(parquet-protobuf):
[cid:[email protected]]


<table class="TM_EMAIL_NOTICE"><tr><td><pre>
TREND MICRO EMAIL NOTICE
The information contained in this email and any attachments is confidential 
and may be subject to copyright or other intellectual property protection. 
If you are not the intended recipient, you are not authorized to use or 
disclose this information, and we request that you notify us by reply mail or
telephone and delete the original message from your mail system.
</pre></td></tr></table>

Some question about parquet schema

Reply via email to