nsivabalan commented on a change in pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#discussion_r671381130
##########
File path:
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java
##########
@@ -81,7 +105,52 @@ public String getPartitionPath(Row row) {
return getKey(genericRecord).getPartitionPath();
}
- void buildFieldPositionMapIfNeeded(StructType structType) {
+ /**
+ * Fetch partition path from {@link InternalRow}.
+ *
+ * @param internalRow {@link InternalRow} instance from which partition path
needs to be fetched from.
+ * @param structType schema of the internalRow.
+ * @return the partition path.
+ */
+ public String getPartitionPath(InternalRow internalRow, StructType
structType) {
+ try {
+ Row row = deserializeRow(getEncoder(structType), internalRow);
+ return getPartitionPath(row);
+ } catch (Exception e) {
+ throw new HoodieIOException("Conversion of InternalRow to Row failed
with exception " + e);
+ }
+ }
+
+ private ExpressionEncoder getEncoder(StructType structType) {
+ if (encoder == null) {
+ encoder = getRowEncoder(structType);
+ }
+ return encoder;
+ }
+
+ private static ExpressionEncoder getRowEncoder(StructType schema) {
+ List<Attribute> attributes =
JavaConversions.asJavaCollection(schema.toAttributes()).stream()
+ .map(Attribute::toAttribute).collect(Collectors.toList());
+ return RowEncoder.apply(schema)
+
.resolveAndBind(JavaConverters.asScalaBufferConverter(attributes).asScala().toSeq(),
+ SimpleAnalyzer$.MODULE$);
+ }
+
+ private static Row deserializeRow(ExpressionEncoder encoder, InternalRow row)
+ throws InvocationTargetException, IllegalAccessException,
NoSuchMethodException, ClassNotFoundException {
+ // TODO remove reflection if Spark 2.x support is dropped
+ if (package$.MODULE$.SPARK_VERSION().startsWith("2.")) {
Review comment:
I could not find any.
```
grep -irl "import org.apache.spark.sql.catalyst.expressions.Attribute"
hudi-*/
hudi-client//hudi-spark-client/src/test/java/org/apache/hudi/testutils/SparkDatasetTestUtils.java
hudi-client//hudi-spark-client/src/test/java/org/apache/hudi/testutils/KeyGeneratorTestUtilities.java
hudi-client//hudi-spark-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java
hudi-integ-test//src/main/java/org/apache/hudi/integ/testsuite/dag/nodes/ValidateDatasetNode.java
hudi-spark-datasource//hudi-spark/src/test/java/org/apache/hudi/TestHoodieDatasetBulkInsertHelper.java
hudi-spark-datasource//hudi-spark/src/main/java/org/apache/hudi/SparkRowWriteHelper.java
hudi-spark-datasource//hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala
```
In most of these places, we do have a static method to getEncoder() , but
the deserializeRow is first of its kind. We do have serializeRow which converts
Row to InternalRow in KeyGeneratorTestUtilities, but its converts from Row ->
InternalRow and its part of test code.
Here we needed to convert InternalRow -> Row.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]