[
https://issues.apache.org/jira/browse/HIVE-9692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sergio Peña updated HIVE-9692:
------------------------------
Attachment: HIVE-9692.1.patch
> Allocate only parquet selected columns in HiveStructConverter class
> -------------------------------------------------------------------
>
> Key: HIVE-9692
> URL: https://issues.apache.org/jira/browse/HIVE-9692
> Project: Hive
> Issue Type: Sub-task
> Reporter: Sergio Peña
> Assignee: Sergio Peña
> Attachments: HIVE-9692.1.patch
>
>
> HiveStructConverter class is where Hive converts parquet objects to hive
> writable objects that will be later parsed by object inspectors. This class
> is allocating enough writable objects as number of columns of the file schema.
> {noformat}
> ublic HiveStructConverter(final GroupType requestedSchema, final GroupType
> tableSchema, Map<String, String> metadata) {
> ...
> this.writables = new Writable[fileSchema.getFieldCount()];
> ...
> }
> {noformat}
> This is always allocated even if we only select a specific number of columns.
> Let's say 2 columns from a table of 50 columns. 50 objects are allocated.
> Only 2 are used, and 48 are unused.
> We should be able to allocate only the requested number of columns in order
> to save memory usage.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)