amansinha100 commented on code in PR #3628:
URL: https://github.com/apache/hive/pull/3628#discussion_r983060737
##########
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java:
##########
@@ -91,11 +93,61 @@ public class ParquetHiveSerDe extends AbstractSerDe
implements SchemaInference {
private ObjectInspector objInspector;
private ParquetHiveRecord parquetRow;
+ private ObjectInspectorConverters.Converter converter;
public ParquetHiveSerDe() {
parquetRow = new ParquetHiveRecord();
}
+ // Recursively check if CHAR or VARCHAR types are used
+ private boolean needsConversion(TypeInfo type) {
Review Comment:
Thinking about the number of times needsConversion() and the subsequent
convert() would be called for a table scan that is reading say N parquet files
each with m row groups. Such operations on a per file or per row group would
be ok but if done on per row basis, it would be a perf hit.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]