To follow-up, after discussing with more senior engineers at my company: I misread what Julien said in regards to accessing by column index. I thought this was equivalent to Thrift ID, but now I understand what he actually meant, and that solution is unfortunately not viable for our use case.
If you read this issue (https://github.com/Parquet/parquet-format/issues/91 <https://github.com/Parquet/parquet-format/issues/91>), what we want is solution #3,but the issue is still open and it looks like that approach was never implemented. So I’m going to have to add add code that does essentially that :D > I think it is easier and more general than that. Thrift already creates the > Parquet schema, so you should be able to create a table definition from a > > Parquet file without even worrying about the original Thrift class. There are > a couple of ways to do this, but none that I know of within Hive. We COULD create a Hive table schema from the Parquet metadata, but that data could be out of date. We want to always use the most up-to-date Thrift schema. Thank you all for you help, I think I know what I need to do now. At some point maybe I can contribute to the Parquet project to allow Hive to access columns by looking at the stored ID field instead of the field name.
