vinothchandar commented on a change in pull request #1402: [WIP][HUDI-407]
Adding Simple Index
URL: https://github.com/apache/incubator-hudi/pull/1402#discussion_r398223341
##########
File path:
hudi-common/src/main/java/org/apache/hudi/common/util/ParquetUtils.java
##########
@@ -103,6 +120,42 @@
return rowKeys;
}
+ /**
+ * Read the rows with record key and partition path from the given parquet
file
+ *
+ * @param filePath The parquet file path.
+ * @param configuration configuration to build fs object
+ * @return Set Set of row keys matching candidateRecordKeys
+ */
+ public static List<Pair<Pair<String, String>, Option<HoodieRecordLocation>>>
fetchRecordKeyPartitionPathFromParquet(Configuration configuration, Path
filePath,
+
String baseInstantTime,
+
String fileId) {
+ List<Pair<Pair<String, String>, Option<HoodieRecordLocation>>> rows = new
ArrayList<>();
+ try {
+ if (!filePath.getFileSystem(configuration).exists(filePath)) {
+ return new ArrayList<>();
+ }
+ Configuration conf = new Configuration(configuration);
+ conf.addResource(FSUtils.getFs(filePath.toString(), conf).getConf());
+ Schema readSchema = HoodieAvroUtils.getRecordKeyPartitionPathSchema();
+ AvroReadSupport.setAvroReadSchema(conf, readSchema);
+ AvroReadSupport.setRequestedProjection(conf, readSchema);
+ ParquetReader reader =
AvroParquetReader.builder(filePath).withConf(conf).build();
Review comment:
I think we can fix it in this patch itself.. its a critical aspect
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services