[GitHub] [hive] amansinha100 commented on a diff in pull request #3628: HIVE-26320: Deserialize Parquet VARCHAR and CHAR types appropriately

GitBox Wed, 28 Sep 2022 21:50:34 -0700


amansinha100 commented on code in PR #3628:
URL: https://github.com/apache/hive/pull/3628#discussion_r983060737



##########
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java:
##########
@@ -91,11 +93,61 @@ public class ParquetHiveSerDe extends AbstractSerDe 
implements SchemaInference {
 
   private ObjectInspector objInspector;
   private ParquetHiveRecord parquetRow;
+  private ObjectInspectorConverters.Converter converter;
 
   public ParquetHiveSerDe() {
     parquetRow = new ParquetHiveRecord();
   }
 
+  // Recursively check if CHAR or VARCHAR types are used
+  private boolean needsConversion(TypeInfo type) {

Review Comment:
   Thinking about the number of times needsConversion() and the subsequent 
convert() would be called for a table scan that is reading say N parquet files 
each with m row groups.  Such operations on a per file or per row group would 
be ok but if done on per row basis, it would be a perf hit. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hive] amansinha100 commented on a diff in pull request #3628: HIVE-26320: Deserialize Parquet VARCHAR and CHAR types appropriately

Reply via email to